THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


The Comparison of Different Scales of Measurement for Experi- 
mental Results. W.G. CocHRAN 


PAGE 


On a Measure Problem Arising in the Theory of Non-Parametric 
Tests. Henry ScHEFFE 

Further Results on Probabilities of a Finite Number of Events. 
Kar Lar Cauna 

On the Problem of Testing Hypotheses. R.v. Mises............ 238 

On the Reliability of the Classical Chi-Square Test. E.J.GumBe.. 253 

A Sampling Inspection Plan for Continuous Production. H. F. 


On the Theory of Runs with Some Applications to Quality Control. 
J. WoLFowITz 

The Accuracy of Sampling Methods in Ecology. PautG. Hog... 

News and Notices 

Report on the Washington Meeting of the Institute 

Report on the First Meeting of the Pittsburgh Chapter of the 
Institute 


Vol. XIV, No. 3 — September, 1943 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
8S. 8. WILKS, Editor 
A. T. CRAIG H. HOTELLING 


W. E. DEMING J. NEYMAN 
T. C. FRY W. A. SHEWHART 


WITH THE COOPERATION OF 
W. G. Cocuran P. S. Dwyer 


J. H. Curtiss C. EIsENHART 
H. F. DopGse W. K. FELLER 


The ANNALS OF MATHEMATICAL Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MATHEMATICAL Sratistics, Mt. 
Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, E. G. Olds, Carnegie Institute of Technology, 
Pittsburgh, Pa. Changes in mailing address which are to become effective for 
a given issue should be reported to the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. Because of war-time difficulties of publica- 
tion, issues may often be from two to four weeks late in appearing. 
Subscribers are therefore requested to wait at least 30 days after month of issue 
before making inquiries concerning non-delivery. 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALs is $5.00 per year. Single copies $1.50. 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BALTIMORE, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 











THE COMPARISON OF DIFFERENT SCALES OF MEASUREMENT FOR 
EXPERIMENTAL RESULTS:? 


By W. G. CocHran 
Iowa State College 


1. Introduction. In some fields of research, the development of a satisfactory 
method for measuring the effects of experimental treatments constitutes a diffi- 
cult problem. The estimation of the vitamin content of preparations of foods 
furnishes a good example; for most of the vitamins several years of work were 
required to construct a reliable method of assay. In other cases, where the ideal 
method for measuring treatment responses is costly or troublesome, a search 
may be made for a more convenient substitute. Thus in pasture or forage-crop 
experiments the species composition of a plot may be estimated by eye inspection 
as a substitute for a complete botanical separation. As a third example we may 
quote experiments in cookery, where the flavor and quality of the dishes are 
subject to the whims of human taste. Frequently a panel of judges is employed, 
each of whom scores the dishes independently. It is not easy to determine how 
the panel should be chosen, nor how representative its verdicts are of consumer 
preferences in general. 

When such problems are investigated, experiments may be carried out spe- 
cifically for the purpose of comparing two or more methods or scales of measure- 
ment. Where the process of measurement affects only the final stages of the 
experiment, as in the last two examples quoted above, all that is necessary is to 
score the same experiment by the various scales under consideration. In com- 
paring two different methods of assaying vitamins, on the other hand, inde- 
pendent experiments are frequently required, the only common feature being 
that the same set of treatments is tested in both experiments. 

In the interpretation of the results of such experiments, two types of compari- 
son are of general interest. One concerns the relations between the scales. It 
may be summed up rather loosely in the question: Are the effects of the treat- 
ments the same in all scales? For a more exact formulation, consider the case 
of two scales, which is probably the most frequent in practice. Let &:, £: 
be the true means of the ‘th treatment as measured on the two scales. We may 
wish to examine the following hypotheses: 

(i) Scales equivalent: 


(1) fie = ba, . (all t); 
(ii) Scales equivalent, apart from a constant difference: 
(2) fir = foe + €, (all ¢); 





1 Paper presented at a meeting of the Institute of Mathematical Statistics, Washington, 
D. C., June 18, 1943. 

2 Journal Paper No. J-1136 of the Iowa Agricultural Experiment Station, Ames, Iowa. 
Project 514. 


205 





e2se <3 
: i 
wet a eiaeota 


. 


o~-Gerrisatri Ret ¢ sree 


te PB 


Witt isis SL 














206 W. G. COCHRAN 


(iii) Scales linearly related: 


(3) abit + Bar = Y, (all t); 
(iv) Relation monotonic, but not linear: 
(4) fe = S (Ge » a, B, °° ), (all t); 


where the function is strictly monotonic. 

In this case the two scales are mutually consistent in that they place any set 
of treatments in the same order. The ratio of a treatment difference in one scale 
to the corresponding difference in the other scale is, however, not constant. 

(v) Relation not monotonic: Here the scales do not place the treatments in the 
same order and consequently are not satisfactory substitutes for each other. 

The second question concerns the relative accuracy or sensitivity of the two 
scales. For practical purposes this question may be put as follows: how many 
replications are required with the second scale to attain the accuracy given by r 
replications with the first scale? It is clear that the answer depends both on the 
experimental errors associated with the scales and on the magnitudes of the 
treatment effects in the two scales. For example, Coward [1] reports that in 
the assay of vitamin D, male rats give a higher experimental error than females, 
yet provide a more accurate assay because they are more responsive. The rela- 
tive accuracy may be different in different parts of the two scales. This is likely 
to happen whenever the relation between the scales is of type (iv) above. 

This paper gives a preliminary discussion of some of the simpler questions 
raised above, to which recent work in multivariate analysis is applicable. A 
complete solution for small sample work appears to demand considerable further 
development in the distribution theory of multivariate analysis. 

The discussion is confined to the case in which all scales measure the same 
experiment. The case where each scale requires a separate experiment may be 
expected to be somewhat simpler, but cannot conveniently be treated as a special 
case of the procedure for a single experiment. 



















2. Assumptions. Let 2, 12, --- 2, denote measurements on the p scales 
and let n; and nz be the numbers of degrees of freedom for treatments and error 
respectively. The experimental data furnish a joint analysis of variance and 
covariance of the p variates as follows: 


Sum of squares 


; d.f. or products 
iain bce whe uci wees 1 Mi; 
(5) re NY Qj; 


eer a a bi; 


It will be assumed that x, --- , 2» follow a multivariate normal distribution, 
and that for any pair of variates z; , x; the error mean covariance o;; is constant 
throughout the experiment (though it may vary as i and j vary). Thus the 








lt); 


t); 


ale 
the 


wo 
iny 
yr 
the 
the 


es, 
la- 
ely 


ns 
1er 
me 


be 
ial 


ror 
nd 


yn, 
nt 
he 


COMPARISON OF SCALES 207 


quantities b;; follow the standard joint distribution, Wishart [16], of sums of 
squares and products while the quantities m;; and a;; follow the corresponding 
non-central distributions and the three sets of distributions are independent. 


3. Tests for equivalence. If there are only two scales, a test for equivalence 
is obtained from elementary techniques. An analysis of variance similar to (5) 
is computed on the differences between the two scales for every observation. If 
equations (1) hold in the population, the sums of squares for the Mean, Treat- 
ments and Error are distributed independently as x*(ou + o2 — 2c). The 
pooled mean square for the Mean and Treatments may therefore be compared 
with the Error mean square in a variance-ratio test, the degrees of freedom being 
(nm, + 1) and m. If the scales are equivalent apart, from a constant difference, 
the same result is valid for Treatments and Error, while the mean square for the 
Mean is proportional to a non-central x*. Thus separate z- or F-tests on the 
Mean and Treatments assist in distinguishing between hypotheses (1) and (2). 

4. More than two scales. Let é;, be the true mean of the tth treatment as 
measured on the ith scale. The first two hypotheses may now be written re- 
spectively: 


(1’) bie = &; 
(2’) fie = be th gs 
fori = 1, 2,---, p. The quantities e;, whose sum may be assumed zero, 


measure the constant differences among the scales. 
If the interactions of all components with Scales are computed, the analysis 
of variance extends formally, with the following separation of degrees of freedom: 


df. 
ss iveddnandeeeeenwaes (p — 1) 

(6) Treatments X Scales.................. m(p — 1) 
RV itnenene deren piececekeuenans m(p — 1) 


The three.lines in the analysis play the same roles as before in relation to 
hypotheses (1’) and (2’). When p > 2, however, it may be shown that the 
three sums of squares are not distributed as multiples of x’ unless (i) all scales 
have the same error variance and (ii) every pair of scales has the same correla- 
tion coefficient. Where these conditions are reasonably well satisfied, as hap- 
pens possibly when experienced judges employ a similar scoring system, the 
above analysis supplies approximate tests. But with scales which differ widely 
in their experimental errors or in their degrees of interrcorrelation, the validity 
of variance-ratio tests is open to more serious question. 

In order to obtain an exact test, we may note that hypothesis (1’) is closely 
related to the Wilks-Lawley hypothesis (Wilks [15], Lawley [9], Hsu [7]) that the 
means of k populations are all equal. If each treatment denotes a separate 
population, the Wilks-Lawley hypothesis states that 


(7) bie = &s (¢ = 1,2,---,m +1). 


At iS 


=f 


Aaa 
Tt) 
Ww 
~~ 
2 
. ad 
A 
Pin 
Loe 
| toed 
re, 
A 
i 
i 
wt 
7 
Ca 
24 
" 
a 
. 
oe 
oma 
a) 
or 








208 W. G. COCHRAN 


Since this differs from (1’) only in the interchange of the letters 7 and f, it jg 
clear that the two hypotheses may be subjected to the same kind of test. 

For the details of the procedure we first divide the (p — 1) comparisons among 
scales into (p — 1) single comparisons by the introduction of a set of variates 
ys, @ = 1, 2,---, p — 1). | 









Pp 
(8) ¥% = dX Niji; - 
j= 
Any set of y’s may be chosen, provided that they are linearly independent and 


that 



























(9) 





S = 1,2, ---@ —- & 


Thus with three scales we might use y: = 11 — 2%, Yo = %1 — 23 Or Y = 
2x2; — X2 — X3,Y2=%2—T73. 

The next step is to compute an analysis of variance and covariance of the y 
variates, as follows: 


Sum of squares 


as. or products 
ae | m 5; 
(10) PN, 650. vcseaswss my a’ 5; 
Raa aekuaes b 5; 






If hypothesis (1’) holds, it follows from (9) that the three sets of quantities 
m;;, 4;; and b;; all follow the standard joint distribution for sums of squares 
and products. Hence Wilks’ test (Wilks [15], Pearson and Wilks [11], Hsu [7]), 
for the equality of the means of k populations may be applied. For a single test 
of hypothesis (1’) we may use 


4 
(11) WF ee ceae 
| bis + may + ais | 


As before, if W is significant we may test whether the deviation is due to constant 
differences or to other types of difference among the scales by calculating 


| bi; | 





(12) , Wn gh, 
| bs; + m,; | 
and 
, | bis | 
13 W.« 1. 
(1s) | bi; + ai; | 


The flexibility of analysis of variance tests is not sacrificed; in particular we 
may test any desired subgroup of the treatments or of the scales. When there 
are only two scales the tests reduce to those given in section 3. 

The tests are invariant under homogeneous linear transformations of the y’s 







‘ities 
lares 


(7}), 
test 


COMPARISON OF SCALES 209 


which explains why the form of the subdivision of the scale comparisons is im- 
material. In fact for purposes of computation it is not necessary to introduce 
the y’s. By taking a simple transformation and expressing a; ; in terms of a; iy 
etc., we may express W directly in terms of the 2’s, as follows: 


B+ M + Ady’ 


(14) W 


where B;;, (B + M + A); are respectively the co-factors of the matrices 
(b:;), (bi; + ms; + @;;). Analogous expressions hold for W,, and W,. In 
practice it will often be preferable to compute the y’s in order that particular 
comparisons among the scale variates may be examined in detail. 

The form of the frequency distribution has been worked out by Wilks [15]. 
For small values of n; and p, the test of significance can be referred to the recent 
tables of the significance levels of the incomplete Beta-function, Thompson [13], 
or to variance-ratio tables. Such cases are listed below, from Wilks [15] and 
Hsu [7]. In our notation, » is taken as (nm; + 1) in equation (11), as 1 in equa- 
tion (12) and as m in equation (13). 


p=3,n>1:f(W)« wir (4 - wy 
= _ 4 
: F{2n,2(m — 1)} = or), 
f(W) « wie? — pryie- 


: Pip — 1,m — p} = @— PG — W) ee =. 


This distribution applies to all tests made on the Mean, equation (12), and all 
cases where a single degree of freedom is isolated from the treatment comparisons. 


n= 2: fW)« Wr? — Wi)? 


+ iii on a _ (m — p +2)(1 — W') 
> F{2(p — 1), 2(m — p + 2)} = —  p-bw° 
A tabulation of the distributions for four and five scales would be useful. 
Hsu [7] has shown that as nz. becomes large, the distribution of — 72 log W tends 
to that of x” with »:(p — 1) degrees of freedom. In general, this approximation 
does not agree very well with the exact distributions above unless n2. exceeds 60. 


5. Interpretation as a problem in canonical correlations. As an introduction 
to the methods that will be used in testing the hypothesis of linearity, we may 
note that hypotheses (1’) and (2’) can be described in terms of canonical correla- 
tions. Fisher [5] has pointed out that the roots 6 of the equation 


(15) | ai; — O(a;; + b4;) | = 0, 


sees 4% 
! 


* 22 888 


sues £4 ST GE HE Se EB 


e 


att 


as es abe 2 Re 
_ 








210 W. G. COCHRAN 





are the squares of the sample canonical correlations between the x-variates and 
a set of m, dummy variates which represent the n, degrees of freedom among 
treatments. In order to obtain the corresponding equation for the population 
correlations, we may suppose that n, and p remain constant while the number 
of replicates r’ and consequently n, increase without limit. After the removal of 
a common factor r’, equation (15) becomes 

































(16) | vii — p (Wis + vo5;) | = 0, 
where 

nitl = " 
(17) vis = DD (bie — Ed Ee — &). 


t=1 





The value of the coefficient v depends on the type of experimental design. For 
a randomized block layout, » = m, and for a simple group comparison v = 
(nm; + 1). 

Now if hypothesis (2’) is true, i.e., &: = & + €, it follows that y;; is inde- 
pendent of i and 7. In this event equation (16) has (p — 1) roots p’ which are 
identically zero. The remaining root corresponds to the best discriminant func- 
tion, Fisher [5], and does not vanish unless the treatments have no effects on 
any of the z-variates. 

Let 28,2; be a population canonical variate for the scale variables. The 
coefficients 8; satisfy the equations 


(18) DX Bilvis — P's + voy)} = 0. t= 1, +++ p. 


For a zero root p’ = 0 we have ¥;; = constant. Hence if a zero root is substi- 
tuted, equation (18) degenerates into 


(19) Bi + Bo +--+ + Bp = 0. 


To summarize, hypothesis (2’) specifies that (i) (p — 1) of the population 
canonical correlations vanish and (ii) any variate 28,x; is a canonical scale 
variate corresponding to a zero root, provided that equation (19) is satisfied. 
Analogous results hold for hypothesis (1’); in this case we replace the Treatments 
line of the analysis of variance by the (Treatments + Mean) line. 





6. Test for linear relationship—two scales. We may assume n; > 2; other- 
wise no test of linearity is possible. If the values of a, 8 and y in equations (3) 
are known, the problem can be reduced to that of testing hypothesis (1) or (2). 
Since this case is unlikely to be encountered frequently in practice, further details 
are omitted. 

When a, 8 and y are unknown, we may theoretically replace the variates 2; 
and 22 by v1, = at; + Br and v2 = pit: + pete, where yw; and pe are chosen so 
that v; and v, are independently distributed. If hypothesis (3) holds, it follows 
from (17) that in terms of the v’s, Yun = yx = 0. Since in addition cy = 0, 
the two roots of equation (16) are 


(20) p =0 and p = yn/(v2 + von). 














and 


ong 
tion 
nber 
al of 


»sti- 


tion 
cale 
ied. 
-nts 


her- 

(3) 
(2). 
ails 


S 7 
1 SO 
OWS 


: O, 


COMPARISON OF SCALES 211 


Thus hypothesis (3) implies that one of the population canonical correlations 
vanishes. Unlike the previous case, however, we cannot construct the corre- 
sponding canonical variate, which requires knowledge of a and 8. 

The selection of a sample test criterion opens up some difficulties. Pending 
further elucidation of the problem, the natural choice seems to be the square 
r; of the lower sample canonical correlation, or the equivalent quantity h, = 
r3/(1 — 72), where he is the lower root of the equation: 


(21) | as; — hbs;| = 0. 


It appears likely, however, that rs and fy are not sufficient estimates of the 
corresponding population parameters. 

When 7, is large, Hsu [8] has shown that the distribution of nh. tends to that 
of x’ with (n, — 1) degrees of freedom. A considerable advance towards the 
small-sample distribution is obtainable from Madow [10], who developed an 
expression for the exact distribution of rj and rz when one of the population 
correlations is different from zero. In our notation this result, which is an im- 
portant generalization of the distribution found by Fisher [5] and Girshick [6] 
may be written as follows: 


ni—3 ood 
meee (rir) ? {1 — ri) — r2)}? (ri — 12) dridrs 


(22) P(A Ee ME, ohy)ay 
_ 2y8(mtmg) ff © —S——— 
7 iL Vi = wy - 7) , 
where p; is the non-vanishing population correlation. It is evident from the 
form of (22) that the distribution of r3 or he involves p:. The conditional dis- 
tribution of he/h; may be relatively insensitive to changes in p, , though even 
this distribution does not seem entirely independent of p, . 

When p; is unity, the small-sample distribution of hz is that of the ratio of two 
independent sums of squares, i.e., he = (nm, — 1)e”/m, with (nm, — 1) and m 
degrees of freedom. This result is a particular case of a more general result 
proved in section 8. From (20) it is seen that p; is close to unity when yx is 
large relative to ox, i.e., when the real differences among the treatments are 
large relative to the experimental errors. In the absence of a usable exact 
solution, the F-distribution may be a better approximation than the large-sample 
distribution of h, for data where 7; is found to be close to unity, though proof of 
this statement is not yet available. 

If it is desired to test hypothesis (3) with the additional assumption that 
vy = 0, we replace a;; by (a;; + m;;) in equation (21) for he , and m by (m + 1) 
in the distribution theory. 








7. Connection with the method of least squares. The previous approach has 
an interesting connection with the method of least squares. We are required to 
test the linearity of relationship between (nm, + 1) pairs of means (Z:, Z2:). 














oe So 












5-2 aan: Se 









—moe ee ere ee es Set 





ae tee ae 5 ee ee ete 2 ee 
r%) “ 


















212 W. G. COCHRAN 





Both variates are subject to error and the errors are correlated; with 7’ replica- 
tions the population variances and covariance of these means are on/1’, o2/r' 
and o/r’. For these unknown quantities we have sample estimates bi/ngr’, 
b2/ner’ and by/ner’ respectively, derived from the Error line of the analysis of 
variance. 

The procedure suggested by the method of least squares is to estimate the 
parameters of the line and use the deviations of the points (#1; , Z2,) from the 
line for a test of linearity. If the population variances were known, the un- 
known quantities a, 8, y and £;, would be estimated by minimizing the quadratic 
form: 














nitl nitl nit+l 
(23) a" d (an — Eu)” + 20" a Tr’ (Lie — Ere) (Vee — ee) + o d 1’ (Zo — Eo)", 
t= t= tam 


subject to the linear relations (3). Here (c’) is the matrix inverse to o;;. On 
substitution of the estimates, expression (23), which is positive definite, would 
serve as a “‘sum of squares” of deviations from the line and therefore as a test 
criterion. This criterion is of course a direct generalization of the weighted 
sum of squares which is used when the errors are independent. 

Van Uven [14] gave an elegant method by which the sum of squares of devia- 
tions can be found directly, before solving for any of the unknown quantities 
In our notation he showed that the sum of squares of deviations is the smaller 
root He of the equation 


(24) 






| as; = Ho;; | — 0, 














where a;; is as before the treatments sum of squares or products. 
Suppose that in default of knowledge of the o;; we derive the weights from the 
sample estimates b,;/ne ; i.e., we minimize (23) with b” in place of o’’, where 
(b’’) = (b;;/m2). In this case the method of Van Uven shows that the sum 
of squares of deviations from the best-fitting line is the smaller root H > of the 
equation 
| . | 


(25) how = > bi = (0. 













Comparing (25) with (21) we find H = mh2. Consequently the least squares 
approach, with sample weights substituted in (23) for the unknown true weights, 
leads to hz as a test criterion., Further, Hsu’s [8] proof that the distribution 
of nh2 tends to x” with (n; — 1) degrees of freedom establishes for this case the 
standard least-squares result for the distribution of the residual sum of squares: 
—namely that when the population weights are known, the residual sum of 
squares is distributed as x’, with degrees of freedom equal to the number of points, 
2(n; + 1), minus the number of independent unknowns, (mn; + 3). By a trans- 
formation of the z-variates to independent variables, this result can be obtained 
alternatively from a theorem by Deming [2]. 









COMPARISON OF SCALES 213 


8. Test for linear relationship—more than two scales. The extension of 
hypothesis (3) to the case of p scales can be expressed by means of the equations 


(3’) ad t+ Bie = ye: (= 2,--- pit =1,--+m +41). 


The equations, (p — 1)(m + 1) in number, postulate a linear relation between 
z, and every other variate and consequently imply a linear relation between 
any pair of variates z; and z;. 

Consider the variates v; = aity: + Bite, (¢ = 2,--- p). For v, we choose 
the linear function of the x’s which is independent of v2, --- v,. Thus in equa- 
tion (16) for the population canonical correlations we have ¥;; = 0, (i, 7, > 2) 
and 1; = 0,(j > 1). It follows that all roots of equation (16) are zero except 
one, the non-vanishing root being p = vu/(¥u + vou). If each treatment 
denotes a separate population, hypothesis (3’) is therefore identical with Fisher’s 
hypothesis [4], that the populations are collinear. 

As a test criterion for this hypothesis Fisher has suggested the sum of the roots 
of equation (21), excluding the highest root, ie., V’ = Zhi = Sri/(1 — ri). 
If ny > p the sum extends over (p — 1) roots, while if ny < p the sum extends 
over (m1; — 1) roots. For computational purposes it may be more expeditious 
to form this sum by subtraction. Hsu [7] has pointed out that the sum of all 
roots is given by V = a b’’a;; , which is obtained readily when the inverse of 


2 
(b;;) has been calculated. The largest root of (21) is then found and subtracted 
from V. 

Fisher [4] also suggested that when equations (3’) hold, the distribution of 
V’ is approximately that of x’ with (p — 1)(m — 1) degrees of freedom. This 
result has been confirmed by Hsu [8] as the limiting form of the V’ distribution 
when n, tends to infinity. As in the case of two scales, the small-sample distri- 
bution is as yet unknown; it presumably contains p; , the non-vanishing correla- 
tion, as a nuisance parameter. 

Some progress towards the small-sample distribution can be made without 
difficulty in the case where p; = 1. For then v, must have a zero Error sum 
of squares in every sample from the population, i.e., v; is constant within any 
given treatment. Consequently (i) bi; = 0 fori = 1,--- p, and (ii) ai;/au 
is a single degree of freedom from the Treatments sum of squares of v;. On 
account of conditions (i), equation (21) reduces to 


an Ay2 cae Ap 


on do. — hbo» ae Gop — hbry | 


| Qip Arp — hbep Vaal App — hbpp | 


Subtract a;;/a,, times the first row from the ith row, for 7 = 2, 
that one root is infinite; the rest are the roots of the equation 


(27) | ai; = hb; ; | _ 0, 1,9 sa 2, a 


Mt 
where aj; = Qj; — @,;0;;/au. 


(26) = 0. 














214 W. G. COCHRAN 





If hypothesis (3’) holds, the quantities a;; follow the Wishart distribution [16] 
with (nm, — 1) degrees of freedom. Hence the joint distribution of hz, --- h, 
or h,, , is that which is obtained when all the population canonical correlations 
vanish, with (n; — 1) in place of m,. For m > p=, the distribution function 
(apart from the constant term) is: 


(28) II na + hy te {II (hy — ns] ° 


For two scales, (p = 2), we reach the result mentioned in section 6, that V’ = h* 
is distributed as (nm, — 1)e*/nm.. This result can also be obtained directly from 
(27). When p = 3, the distribution of V’ is obtainable from a result by Hsu [7]. 






9. Measures of relative sensitivity. We propose to discuss briefly the esti- 
mation of the relative sensitivity of two scales and to indicate the types of 
distribution that are involved. If there are only two treatments, ¢, ¢’, an ap- 
propriate definition of the true sensitivity of the 7th scale is 


(7 
(29) é : a ’ > 
or some simple function of this quantity. In justification, we may observe 
that for a fixed number of replicates, the power function of the ¢-test in the ith 
scale depends entirely on this quantity. An unbiased sample estimate is 


(ne —_ 2) (Zier —_ Ei) a 1 
(30) 2b: r’ ? 


where 7’ is the number of replicates. Since (30) involves a non-central variance 
ratio, confidence limits for the true sensitivity can be found from Fisher’s Type 
C distribution, Fisher [3]. 

It follows from (3) and (29) that if two scales are linearly related (including 
the case of equivalence) their relative sensitivity is constant for all treatment 
comparisons. For scale 1 relative to scale 2 the sensitivity is measured by 
B’on/ 0 on « 

If the scales are equivalent, apart possibly from a constant difference, this 
quantity reduces to g = o2/o1, , for which F = b/by serves as a sample estimate. 
A test of significance of the sample ratio and confidence limits for the true ratio 
may be obtained from Pitman [12], who showed that 


0 C/V EH). 


follows the distribution of a sample correlation coefficient from (m2. + 1) pairs 
of observations. In (31), riz = biz/bube. The same procedure may be used 
whenever a and £ are known. 

When a and £ are unknown, a sample estimate of the relative sensitivity is 
b°be2/a"b;, , where (az; + baz) is the discriminant function which corresponds to 











COMPARISON OF SCALES 215 


the lower root of equation (21). We have not been able to reach thedistribution 
of this estimate. Confidence limits for the relative sensitivity can, however, 
be obtained when 7, is sufficiently large so that o1; and o22 may be assumed known, 
For in that case the problem reduces to that of finding confidence limits for 
s/o’. Now if a, 8 are the true coefficients, the quantity 


o an + 2aBaw + 8 ar 
abu + 2aBbir + Bb” 


follows the ne“/n, distribution. Any proposed values of a and 8 which make 
(32) significant are rejected by the evidence of the sample. By equating (32) 
to the desired significance level of n,e*/n2 , we get a quadratic equation for the 
two limits of B/a. The limits will not be narrow unless the treatment effects 
are large. 

If the relation between the scales is non-linear, and the assumption of a con- 
stant error variance throughout an individual scale is valid, the relative sensi- 
tivity differs for different treatment comparisons. Even in this event estimates 
of relative sensitivity may be of interest. Attention might be restricted to a 
single degree of freedom from the treatment comparisons, in which case the 
definition for two treatments could be applied. 

Alternatively an estimate might be wanted of the average relative sensitivity 
over all treatment comparisons. For a given number of replicates, the power 
function of the variance-ratio test of the treatment effects in the ith scale de- 
pends only on the quantity 


(32) 


(33) 


x (fe — &)° 


Cui 


Consequently this quantity, which is an extension of (29), might be chosen as 
a measure of average sensitivity. The corresponding generalization of the 
unbiased sample estimate (20) is 


mM r'bis r’ , 
Since the quantity a;;/b;; is a multiple of a non-central variance ratio, the com- 


parison of two scales involves a test of significance of the hypothesis that two 
non-central variance ratios are equal. 


10. Summary. This paper discusses the analysis of data obtained when the 
results of a replicated experiment are measured on several different scales which 
we wish to compare. Recent work in multivariate analysis provides tests of 
the hypothesis that the treatment effects are the same in all scales, and of the 
hypothesis that the scales are linearly related. When the number of Error 
degrees of freedom is large, the significance levels of these tests are obtainable 
from the standard tables. For small sample tests, further investigation and 





216 W. G. COCHRAN 


tabulation of certain distributions will be needed, particularly that of the sample 
canonical correlations when one population correlation differs from zero. 

A brief discussion is given of methods for comparing the relative sensitivity of 
two scales. 


REFERENCES 


{1] K. H. Cowarp, The Biological Standardization of the Vitamins, 1937. 
[2] W. E. Demin, “The chi-test and curve-fitting,’ Jour. Amer. Stat. Assn., Vol. 29 (1934), 
pp. 372-382. 
[3] R. A. FisHer, ‘‘The general sampling distribution of the multiple correlation coeffi- 
cient,’’ Proc. Roy. Soc. A, Vol. 121 (1928), pp. 654-673. 
[4] R. A. Fisner, ‘‘The statistical utilization of multiple measurements,’’ Annals of 
Eugenics, Vol. 8 (1938), pp. 376-386. 
(5) R. A. Fisuer, ‘‘The sampling distribution of some statistics obtained from non-linear 
equations,’’ Annals of Eugenics, Vol. 9 (1939), pp. 238-249. 
[6] M. A. Grrsuicx, ‘‘On the sampling theory of the roots of determinental equations,” 
Annals of Math. Stat., Vol. 10 (1939), pp. 203-224. 
[7] P. L. Hsu, ‘‘On generalized analysis of variance (I),’’ Biometrika, Vol. 31 (1940), pp. 
221-237. 
[8] P. L. Hsu, ‘‘The problem of rank and the limiting distribution of Fisher’s test func- 
tion,’’ Annals of Eugenics, Vol. 11 (1941), pp. 39-41. 
[9] D. N. Lawtey, “‘A generalization of Fisher’s z-test,’’ Biometrika, Vol. 30 (1938), pp. 
180-187. 
[10] W. G. Mapow, ‘‘Contributions to the theory of multivariate statistical analysis,” 
Trans. Amer. Math. Soc., Vol. 44 (1938), p. 490. 
[11] E. S. Pearson and S. S. Wixxs, ‘‘Methods of statistical analysis appropriate for k 
samples of two variables,’’ Biometrika, Vol. 25, (1933), pp. 353-378. 
[12] E. J. G. Prrman, ‘‘A note on normal correlation,’’ Biometrika, Vol. 31 (1939), pp. 9-12. 
[13] C. M. THompson, ‘‘Tables of percentage points of the incomplete Beta-function,” 
Biometrika, Vol. 32 (1942), pp. 151-181. 
[14] M. J. Van Uven, ‘‘ Adjustment of N points (in n-dimensional space) to the best linear 
(n — 1) dimensional space,’’ Proc. Koninklizke Akad. van Wetenschappen te Am- 
sterdam, Vol. 33 (1930), pp. 143-157. 
[15] S. S. Wixks, “‘Certain generalizations in the analysis of variance,’’ Biometrika, Vol. 
24 (1932), pp. 471-494. 
(16) J. WisHart, ‘‘The generalized product moment distribution in samples from a normal 
multivariate population,’’ Biometrika, Vol. 20A (1928), pp. 32-52. 





ON STOCHASTIC LIMIT AND ORDER RELATIONSHIPS 


By H. B. Mann anv A. WaALpD! 
Columbia University 


1. Introduction. The concept of a stochastic limit is frequently used in 
statistical literature. Writers of papers on problems in statistics and probability 
usually prove only those special cases of more general theorems which are neces- 
sary for the solution of their particular problems. Thus readers of statistical 
papers are confronted with the necessity of laboriously ploughing through de- 
tails, a task which is made more difficult by the fact that no uniform notation 
has as yet been introduced. It is therefore the purpose of the presert paper to 
outline a systematic theory of stochastic limit and order relationships and at the 
same time to propose a convenient notation analogous to the notation of ordinary 
limit and order relationships. The theorems derived in this paper are of a more 
general nature and seem to contain to the authors’ knowledge all previous results 
in the literature. For instance the so-called 6-method for the derivation of 
asymptotic standard deviations and limit distributions, also two lemmas by 
J. L. Doob [1] on products, sums and quotients of random variables and a 
theorem derived by W. G. Madow [2] are special cases of our results. It is hoped 
that such a general theory together with a convenient notation will considerably 
facilitate the derivation of theorems concerning stochastic limits and limit dis- 
tributions. In section 2 we define the notion of convergence in probability and 
that of stochastic order and derive 5 theorems of a very general nature. Sec- 
tion 2 contains 2 corollaries of these general theorems which have so far been 
most important in applications. 

We shall frequently need the concept of a vector. A vector a = (a’, ---,a’) 
is an ordered set of r numbersa’,---,a’. The numbers a’, ---,a’ are called the 
components of a. If the components are random variables then the vector is 
called a random vector. 

We shall generally denote by a, b constant vectors by x, y random vectors and 
bya’, ---,a’,z’,---,2" their components. Differing from the usual practice we 
shall put |a| = (|a’|, ---,|a@" |) and we shall write a < bora < bifa’ <b’ 
ora’ < b‘forevery i. This notation saves a great amount of writing, since all 
our theorems except theorem 4 are valid for sequences of any number of jointly 
distributed variates. 

We shall review here the ordinary order notation. In all that follows let f(N) 
be a positive function defined for all positive integers N. 


1 Research under a grant-in-aid from the Carnegie Corporation of New York. 


217 



























218 H. B. MANN AND A. WALD 






We write 
ay = o[f(N)] if lim av/f(N) = 0. 


ay = O[f(N)] if | av | < Mf(N) for all N and a fixed M > 0. 

ay = Of(N)] if 0 < M’f(N) <|aw| < Mf(N) for almost all N and for 

two fixed numbers M > M’ > Q, 

ay = w[f(N)] if 0 < Mf(N) < | aw | for almost all N and a fixed M > 0, 

For instance, log N = o(N‘) for every « > 0, or sin N/N = O(1/N), 3 + 4: 
N/(4 +8 VN) = VN) 5/sin N = o(1). 

For any statement V we shall denote by P(V) the probability that V holds, 





2. General theorems on stochastic limit and order relationships. 
DerFIniTIon 1. We write plimzy = 0. (In words xy converges in probability to 
N-o© 


0 with increasing N) if for every « > 0 lim P(|ay| < ¢€-) = 1. Further plim 
N-o 
ty = x if plim (ty — zx) = 0. 
No 
DEFINITION 2. We write xy = 0,[f(N)] (xn is of probability order o[f(N)]) if 
plim zy/f(N) = 0. 
No 


DEFINITION 3. We write ty = O,[f(N)] (aw is of probability order O[f(N))) if 
for each « > 0 there exists an A, > 0 such that P(| xv | < A.f(N)) > 1 — € for 
all values of N. 

DEFINITION 4. ty = Q,[f(N)] if for each e€ > 0 there exist two numbers A, 
> 0 and B, > 0 and an integer N, such that P[A.f(N) <|2w| < BJ(N)] = 
1-—eforallN>N,. 

DEFINITION 5. Xn = wylf(N)] if for every « > 0 there exists an A, > 0 and an 
integer N, such that P[A.f(N) <|a2w|]>1-—eforallN>N,. 

Let E denote a vector space. For any subset E’ of E the symbol a C E’ will 
mean that a is an element of EL’. 

Since P(x CE, & x CE.) > P(x CE,) — P(x € E.) we evidently have 

Lemma l. If Pa CE;)) > 1 —«¢ P(e CE) > 1 — ¢, then P(x CE; 
zCkEK)>l—e-e. 

We now put 0’ = 0, 0° = 0,0° = 2, Of = w. 

THEOREM 1. For every ¢ > 0 let {Ry(e)} be a sequence of subsets of the r-di- 
mensional Cartesian space such that P(xy C Ry(e)) > 1 — € for all N greater than 
a certain integer N,. Let {gw(x)} be a sequence of functions of x = (z', 2’, --- 2’) 
such that gv(aw) = O'[f(N)] for any « > 0 and for any sequence {an} for which 
ay C Ry(e). Then we have gy(xw) = O;[f(N)]. 

Proor: For i = 1, 2, 3, there exists a positive integer N. such that | gy(a) | is 
a bounded function of ain Ry(e) for N > N.. For otherwise we could construct 
a sequence {ay} with ay in Ry(e) such that | gv(ayv) | > Mf(N) for any M and 
for infinitely many values of N which contradicts the hypothesis of our theorem. 
Hence there exists an N, such that for N > N, the function | gy(a) | is bounded 
in Ry(e). Let My(e) be the l.u.b. of | gv(a) | /f(N) in Ry(e). We can construct 
a sequence {ay} with ay C Ry(e) such that | gy(aw) | /f(N) > My(e)/2 for all 


N>WN.. Hence for i = 2, 3 the sequence My(e) must be bounded and for 

















S 
= 


roo 


STOCHASTIC LIMIT. - 219 














1 we must have lim My(e) = 0. Let M (e) be the lub. of My(e). For 


3, 4 one shows in sauna the same manner the existence of a g.l.b. M(ce) of 

‘ent | /f(N) if a C Ry(e) and for N > N. Hence for sufficiently large N 
we have 
P{ | gw(tw) | < My(©)f(N)] = 1 — € with lim My(e) = Ofori = 1, 


No 


P(| gw(zw) | < M(@f(N)] > 1 — « fori = 2, 
P[M()f(N) < | gw(aw) | < M(@f(N)] >1— fori = 3, 
P(M(e)f(N) < | gw(aw) |] >1— fori = 4. 


For i = 2 the existence of an M’(e) such that P[ | gw(zw) | < M’(@f(N)] >1—e 
for all N follows easily from this result. Hence our theorem is proved. 

Corotitary 1. If xi = O%[f(N)] forj = 1, 2,---, rand {Ry(} is a se- 
quence of subsets of the k-dimensional space y’, y’, --- y* such that P[yy C Ry(e)] > 
1 — efor sufficiently large N, and if {gy(z', 2°, --- 2’, y', y’, --- y",)} is a sequence 
of functions of x’, x", --- x", y', y’, --+ y* such that for any e > 0 we have gy(ay, by) 
= O'[f(N)] for every sequence {ay, by} with ay = O* [f (N)] (j = 1,2, ---,7r) and 
by C Ry(e), then gy(zw, yw) = O;[f(N)]. 

Proor: It follows from Lemma 1, the definition of the relation ry = OF [f;(N)] 
and the hypothesis of our corollary that for any « > 0 there exists a sequence 
of subsets {Rw(e)} of the space z’, --- , 2’, y', --- , y* which satisfies the condi- 
tions of Theorem 1 with respect to the sequence of functions {gy}. Hence 
Corollary 1 is an immediate consequence of Theorem 1. 

Corollary 1 implies inter alia that all operational rules for the ordinary order 
and limit relations are also applicable to stochastic limit and order relations. 
For instance o[f(N)]/2 [g(N)] = olf(N)/g(N)]. Hence also 0,[j(N)]/2,[9(N)] = 
oplf(N)/9(N))I. 

DeFmniTion 6. For any N let Ry be a region, f(a) a function defined on Ry. 
The sequence {fx(a)} will be said to be uniformly continuous with respect to {Ry} if 
the following condition is fulfilled. For every « > 0 there exists a vector 5 > 0 
such that for almost all N 


\fw(a + 6) —fu(a)|<e€ forany|é| < 6, and for anya C Ry 
THEOREM 2. Let plim (ty — yw) = 0. For every ¢ > 0 let {Ry(e)} be a se- 
N-2 


1 
























quence of subsets of the r-dimensional vector space such that for almost all N we have 

Plyy C Ry(e€)] > 1 — «. If the sequence of functions {fy(a)} is uniformly con- 

tinuous with respect to {Ry(e)} for every « > 0, then plim [fy(rw) — fx(yw)] = 
N-*e 


Proor: We have fx(zyv) — fx(yx) = fw(yw + 2~) — f(y) where 2) = (1) 
forj = 1,---,7. Because of the uniform continuity of fy(a) with respect to 
Ry(e) we see that for every sequence {ay , by} with ay C Ry(e) and by = o(1) 
G = 1, 2, °**, r). 







Sw(aw + bw) — fu(aw) = o(1). 


220 H. B. MANN AND A. WALD 


Hence Theorem 2 follows from Corollary 1. 
In the following we shall abbreviate ‘cumulative distribution function” by df. 
Derinition 7. Let {xw} be a sequence of random variables. Let Fy be the df. 
of xy. Let x have the distribution F. We shall write dx (xy) = d(x) if lim Fy 


= F in every continuity point of F. 
THeorEM 3. Let plim (zy — yw) = O and dx(yy) = d(y); then d~ (ry) = 
N-o 


d(y). 
Proor: Let Gy , Fy be the d-f.’s of zy , yw resp. For any 6 > 0 we have 


Piyy Sa +6) > Play < a; yn Sa +4) > Plaw < 4; | yw — tw| < 4) 
> P(tw < a) — P(| yw — 2w| > 8), 
P(tw < a) > P(tw S a; yw < a — 6) > Plyw Sa — 5) 
— P(|av — yw| > 8). 
Hence since P(yy < a) = Fy(a), P(zy < a) = Gr(a), lim P (| zw — yv| >) = 


0 we have lim. sup. Fy(a + 6) > lim. sup. Gy(a) > lim. inf. Gy(a) > lim. inf. 
Fy(a = 6). 
If a + 6 and a — 6 are continuity points of F we have 


F(a + 6) > lim. sup. Gy(a) > lim. inf. Gy(a) > F(a — 4). 


For any 49 > 0 there exists a positive 6 < 49 such that a — 6 and a + éare 
continuity points of F. Hence we can choose 6 arbitrarily small and if a is a 
continuity point of F we must have 


lim. Gy(a) = F(a). 


THEOREM 4. Let xy, yw be two sequences of one-dimensional vectors and let 
plim (ty — yw) = 0. Let Fy, Gy be the cumulative distribution functions of 


No 

ry and yy respectively. Let Ry(e) be the set of points a for which | Fy(a) — Gy(a) | 

> «. Let My(e) be the Lebesgue measure of this set. Then lim My(e) = 0 
N-2 


for every « > 0. 

We first prove the following lemma. 

LEMMA 2. Let 6, € be any arbitrary positive numbers and let f be a distribution 
function. The set of points a for which f(a + 5) — f(a) > € has at most the Lebesgue 
measure 6/€. . 

Proor: The points a for which f(a + 6) — f(a) > e must have a lower bound 4. 
Otherwise we could find infinitely many such points whose distance from each 
other is more than 6. But this contradicts the requirement that f(o) = 1. 
Let a; be the g.l.b. of the a’s. Then for any 7 > 0 in the interval (a, < x < 
a, + 6 + 7») the value of F increases at least by the amount e. Let now a, be 
the g.l.b. of the a’s outside of this interval. We continue our construction by 
constructing the interval (a. < x < a2 + 6+ 7) and so forth. But after at most 





STOCHASTIC LIMIT 221 


1/esuch steps the construction must stop. Hence all points a for which f(a + 6) 
— f(a) > ¢€ are contained in at most 4/e intervals of length 6 + 7. Hence since 
1 was arbitrary the Lebesgue measure of this set is at most 6/e. 

We come now to the proof of our theorem. We have 


P(ty < a) > Play < a; yn Sa +8) > Play < a) — P(| tw — yw| > 8), 
Pw < a+ 4) > Plaw S a:ywS a+ 6) > Plyw Sa +8) 


—P(|2v — yy} > 6) —P(a < ty S a +28). 
Therefore 


P(ty < a; yn S a+ 6) = P(ty < a) — byP(| tv — yw| > 4) 
= P(yy < a+ 6) — 6,P(| rv — yw| > 8) — 6sP(a < ty < a + 28), 
where O < 6y < 1,0 < 6, <1. Hence 
P(yy < a+ 6) = Plty < a) + OvP(| tw — yw| > 4) 
+ Ox{Fw(a + 26) — Fr(a)} 


where | @y |, | @v | < 1. 

By hypothesis we have P(| zy — yw| > 1/m) < 1/m for almost all N and 
every integer m. Hence we can choose a sequence {dy} with 6y > 0 in such a 
way that lim 6y = 0, lim P(| zy — yw | > dy) = 0. We can then choose N, 

N-o@ N20 


so that P(|2yw — yw| > by) < ¢/3 for N > N,. Applying Lemma 2 we see 
that except for a set of measure at most 6 5y/e we have Fy(a + 26n) — Fy(a) < 
e/3. Similarly the set of points for which gy(a + 6y) — gy(a) > €/3 has at most 
the Lebesgue measure 3 6y/e. Hence, except in a set of points whose measure 
is at most 9 6y/e, we have 


| Gx(a) — Fy(a)| < «6, 


and this completes the proof of Theorem 4. 
THEOREM 4a. Let plim (ry — yy) = 0. Let Fy , Gy be the distribution func- 
N-o 


tions of xy , yw respectively. Furthermore, let Ry(e) be the set of points inside an 
r-dimensional cube where | Fy — Gy | > € and let My(e) be the Lebesgue measure 
of Ry(e), then lim My(e) = 0. 

N-o 


We prove first 

Lemma 2a. Let 5 = (6', 8, ---, 8) > Oand maz. & = d. Let I be the cube 
defined by (—A < x’ < Aji =1,2,---1r). Let furthermore f bead.f. Then 
the Lebesgue measure of the points a in I for which f(a + 6) — f(a) > e€ is at most 
dr’ A” */e. 

Proor: Let f;(z'), fo(2”), --- f,(2”) be the marginal distributions of 2’, 2’, --- 
xz’ respectively. It follows from Lemma 2 that the linear Lebesgue measure of 
those numbers a’ for which f;(a* + 5°) — fi(a') > ¢/ris smaller than rd/e. We 
form the set (z’ = a’ & x C1) for every such a‘ and fori = 1,2,---r. The 

* 





222 H. B. MANN AND A, WALD 


Lebesgue measure of the sum R(e) of all these sets is at most rdA”™’/e. We 
shall show that R(e) contains all points a inside J for which f(a + 6) — f(a) > «. 
We have 


fa’ +s,a°+8,++-,a° +8) —fa'a,---a) =a +&t---+4,, 


where A; _ f(a’, a’, _? gt. a’ + 5’, arate: a’ + 5’) — f(a’, — a’, q’* +4 yo. site 
a’ +6). Iff(a +6) — f(a) > ¢ then we must have for at least one 7 


A; => é/r. 


But A; is the probability of a subset of the set T = (a*° < x* < a‘ + 8°) and 
f(a’ + 8°) — f,(a’) is the probability of T itself. Hence 

e/r < A; < fia’ + 8°) — fi(a’), 
and if (a’, a’, --- a’) is in J then it is contained in R(e). Hence Lemma 2a is 
proved. 

The proof of Theorem 4a using Lemma 2a is similar to that of Theorem 4 and 
therefore it is omitted. 

The Jordan measure of a set R with respect to the distribution function F is 
defined as follows. We consider only intervals whose boundary points are 
continuity points of F. We cover R with the sum 7 of a finite number of inter- 
vals. (The intervals themselves may also be infinite. For instance the sets 
a<2z< ©,a <2 < © are also considered intervals.) We consider M(J) = 


/ dF for every I covering R. The g.l.b. of all such M(J) is called the exterior 
I 


Jordan measure M(R) of R. Similarly we consider all sums J of a finite number 
of intervals which are contained in R. The 1.u.b. of [ dF is called the interior 


= —_ I 
Jordan measure M(R) of R. If M(R) = M(R) then M(R) is called the Jordan 
measure of R. 
LemMaA 3. Let Fy(x) be a sequence of d.f.’s such that lim Fy(x) = F(x) in every 
N-2 


continuity point of F(x). Let h(x) be a bounded function such that the discontinuity 
+00 
points of h(x) have the Jordan measure 0 with respect to F and such that [ h(x) dF y 


+00 +00 


(x) and h(x) dF(x) exist. Then lim C h(x) dF x(x) = i h(x) dF (x). 


Proor: There is only an enumerable set of hyperplanes parallel to the plane 
z* = 0 which have positive probability with respect to F. Hence we can find 
for every 6 an interval net whose cells have a diameter at most 6 and such that 
the boundary points of every cell are continuity points of F. 


Wo fieut deteensine 0 Ganed Galte taterval J each Cnt i aF(2) > 1 — £ and 
g 


such that the boundary points of J are continuity points of F. We further 
determine a sum I’ of a finite number of open intervals such that J’ contains all 


discontinuity points of P, / dF(x) < > and such that the boundary of I’ does 
I’ 








nd 


‘ior 
lan 
ery 
“ity 
IF y 


ane 
ind 
hat 


ind 


her 
all 













STOCHASTIC LIMIT 223 


not contain any discontinuity points of F. All this is possible by hypothesis 
and because the set of hyperplanes with positive probability is enumerable. 
Let R be the subset of J consisting of all points of J which are not contained in J’. 
Ris a closed set and can be decomposed into a finite number of intervals. The 
function h is continuous in FR and therefore uniformly continuous. We can 
therefore cover FR by a finite set of intervals such that the variation of h in every 
interval is less than e and such that the boundary points of each interval are 
continuity points of F. Let 1,, J2, --- J, be such a finite set of intervals. Let 
z;be any point in J;. We have 

| _k 


|| =| [ na) dete) — [ nee) ar@)| = LX | we) - nel are(e) 
- . J, [h(x) — h(a,)] dF(x) + ‘ h(2;) i. dF x(x) — J, ar(a)| 
+ [ 4, Ma) dF x(2) — [ he) ar(2)| 
Cehet > hea) LY. dF (2) - [are] 
+ max. h(x) iF dF y(xz) + | ‘ 


But lim | dFy(rz) > 1—e. Hence 
Noo YR 


lim. sup. Hy < 2e + 2 max. A(z) . 


Since e was arbitrary, we must have lim Hy = 0. 


No 
We are now prepared to prove 
THEOREM 5. Let dx(xry) = d(x). Let g(x) be a Borel measurable function 
such that the set R of discontinuity points of g(x) is closed and P(x C R) = 0. 
Then dx[g(xw)] = d[g(z)). 
Proor: Let Fy be the d.f. of zy, F the df. of xz, Fy, F, the d.f.’s of g(xw), 
g(x) resp. Then lim Fy = F in every cont. point of F. Let h(x) be defined 


N-o 
as follows: 


h(x) = lif g(x) <a, 
h(x) = Oif g(x) >a. 


The discontinuities of h are contained in the set M of all points where g(x) = a 
and is continuous or where g(x) is discontinuous. The set R of discontinuity 
points of g(x) is closed and of measure 0 with respect to F. We can therefore 
subtract from M a sum R* of a finite number of open intervals of arbitrarily 
small measure with respect to F which contains all discontinuity points of g(z). 
This difference set M’ is closed and contains only points where g(x) = a and 








224 H. B. MANN AND A. WALD 






a ¢R. Ifaisacontinuity point of F, then the Borel measure of M’ with respect 
to Fis 0. Since M’ is closed, its Jordan measure is also 0. Hence the Jordan 
measure of the discontinuity points of h(x) is 0 if a is a continuity point of F,, 


+00 +00 
Since g(x) is Borel measurable, [ h(x) dF x(x) = Fy,(a) and [ h(x) dF(z) 
= F,(a) exist for every a. Hence by Lemma 3 lim Fy,(a) = F,(a) in every 
N-*00 


continuity point of F, and this proves our theorem. 





3. Corollaries and applications. Corotiary 2. Jf plim (ty — yw) = 0, 
N-*0o 
dx (yw) = d(y) and if f is continuous except in a set R for which lim P(yy © R) 
N00 
= 0 then plim f(xy) — f(yw) = 0. 
N-o 


Proor: Let J be a closed interval such that P(yy CJ) > 1— ¢/2. Let I’ be 
a sum of open intervals containing all discontinuity points of f(x) in J and such 
that P(yy C I’) < ¢/2 for sufficiently large N. The set J of points of I which 
are not points of I’ is a closed set. Hence f is uniformly continuous in J and 
P(yy C J) > 1 — e for sufficiently large N. In Theorem 2 we put Ry(e) = J, 
t fy =f. Then all conditions of Theorem 2 are satisfied and it follows that plim 
1 N-@ 
: [f(zx) — f(yw)] = 0. 

If, moreover, the set of discontinuity points of f is closed then by Theorems 
3 and 5 d«[f(zw)] = d=[f(yw)] = d[f(y)]. 

Special cases of Corollary 2 have been proved by J. L. Doob and W. G. 
Madow (2). 

Theorem 5 is very useful in deriving limit distributions. 

It follows for instance from Theorem 5 that if do(zxy) = d(x), d~(yy) = 
d(y), where x, y are independently and normally distributed with mean 0 and 
equal variances, then d~ (r4y/yw) = d(x/y). That is to say the distribution of 
i tw/yw converges to a Cauchy distribution. 

It also follows from Theorem 5 that under very general conditions the limit 
distribution of t = ~/N(é — y)/sis normal. (% = sample mean, » = population 
mean, s’ = sample variance.) For we have under very general conditions d = 
/N(é — uw) = d(é), plim s = a, where ¢ is normally distributed with vari- 
ance o. 

Applying Theorem 5 it can also easily be shown that under very general 
conditions the limit distribution of T’ is a chi-square distribution if the means 
of all variates are 0. Hotelling’s T° (the generalized Student ratio) for a 
p-variate distribution is defined as follows: 


Pp Pp ‘ 
ilies: 2, 2, Ask &j where || Ajj|| = ||sa|[",  & = #, 
i=] j= 
where s;; is the sample covariance between az and #. . 
We have d~(A;;) = d(o"’), where || o:; | = |Jo” ||. If E(z’) = 0 fori = 
1,2, --- pthen dx (+/N £;) = d(n;) where the »; have a joint normal distribution 






































STOCHASTIC LIMIT 


with covariance matrix || o;; ||. Hence 


Pp Pp Pp 
dx (T*) = | ¥ ons - a( ae), 
t=] j= i= 
where the 7; are normally and independently distributed with variance 1. 
Hence the distribution of T’ converges to a chi-square distribution with p degrees 
of freedom. 
If the samples are drawn from a sequence of populations {zy} all with the 
same covariance matrix and such that lint ~/Nuiv = pi where piv is the mean 


No 

value of the 7th variate in the Nth population, then one sees in exactly the same 
way that the limit distribution of 7” is a non-central square distribution with 
p degrees of freedom. 

The limit distribution of T’ has been derived by W. G. Madow (2). 

CoroLuary 3. Let xy, yw be r-dimensional vectors d~ (yx) = d(y) and zy — 
yw = O,[f(N)] with lim f(N) = 0. Let g(x) be a function admitting continuous 

No 

jth derivatives except in a set R with lim P(yy C R) = 0. Let 


N—o 


then 
g(tx) — g(yw) — Taw , yw) = op{[f(N)Y’}. 


Since the jth derivatives are continuous except in a set of limit measure 0 
we can determine a closed set R(€) on which they are uniformly continuous and 
so that P(yy C R(e)) > 1 — efor sufficiently large N. Then for every sequence 
with ay — by = O(f(lV)), bv © R(e) we have 

g(an) — glbv) — T;(aw, bw) = olf(N )}. 
Hence Corollary 3 follows from Theorem 1. 

Corollary 3 was first proved by W. G. Madow [2] and J. L. Doob [1] for the 
important case that yy is a constant. 

The following example will illustrate Corollary 3. Let x, y be normally and 


independently distributed random variables with mean 0 and variance 1; {zy}, 
{zy} sequences of random variables with plim ~/N zy = plim WN zy = 1. 


N-o N-o@ 
Let ty = 2 + zy, yn = y +2n~. We consider the function g(z, y) = x°/3 + 
y°/3 + 2a — Qy+ 5. Applying Corollary 1 it is easy to verify that g(rw , yn) — 
g(z, y) = 2%[1/-N], zv = O,(1/N), 2x = O,(1/~/N). Hence applying 
Corollary 3 for 7 = 1 we have 
g(tw , yw) — g(x,y) — (2 + 2)ew — (y? — 2)en = 0,(1//N). 
Multiplying by ~/N we have 


[g(aw , yw) — g(x, y)] SN — [(x? + 2)ew + (y? — 2)en] VN = 0,(1). 





226 H. B. MANN AND A. WALD 


This is equivalent to 
plim( VN G(en , yx) — g(z, 9) = 2 +9". 


Hence the distribution of ~/N(g(zw , yw) — g(x, y)) converges to the chi-square 
distribution with 2 degrees of freedom. 
If plim zy = a and {oy} is a sequence of numbers with lim oy = 0 such that 


N-*00 : N-0 
do[(xy — a‘)/on] = d(&:) where the &; are constants or random variables and 
if g admits continuous first derivatives at x = a at least one of which is different 
og 


02*] z= 


from 0, then putting ( = g;, we have 


g(tx) — g(a) = gi(zy — a) + --- + 9-(ay — a’) + 0,(on) . 


Hence applying Theorems 3 and 5 we have 


(i) doo [= 9) | — aes + +++ + orb 


That is to say the distribution of [g(zv) — g(a)]/ow converges to the distribution 
Pp 

of >> git: in all continuity points of the latter. A corresponding result can be 
t=1 


obtained from Corollary 3 if all first derivatives are 0 at x = a and at least 
one second derivative is different from 0 and so forth. 

A method of deriving limiting distributions and limit standard deviations based 
on (i) is known as the 6-method and has been extensively applied in statistical 
literature. 


REFERENCES 


[1] J. L. Doon, ‘The limiting distribution of certain statistics.”” Annals of Math. Stat., Vol. 
6 (1935). 

[2] W. G. Mapvow, “Limiting distribution of quadratic and bilinear forms.’’ Annals of 
Math. Stat., Vol. 11 (1940). 





ON A MEASURE PROBLEM ARISING IN THE THEORY OF 
NON-PARAMETRIC TESTS | 


By Henry ScHEeFFé 


Princeton University 


1. Introduction. While the contents of this paper have broader statistical 
implications, they were motivated by the following problem: Given two samples, 
(Y1, Y2,-°*:, Ym) and (Z,, Z:,---,2Z,) from univariate populations with 
cumulative distribution functions (c.d.f’s) F(z) and G(x), respectively, and 
given furthermore that F and G are members of a certain class © of c.d.f’s, to 
test the hypothesis that F = G. We shall refer to this as “the problem of two 
samples” [8]. It is an example of what Wolfowitz has called problems of the 
non-parametric case [8]. 

For the theory of non-parametric problems the following classification of 
c.d.f’s is appropriate: Let be the class of all univariate c.d.f’s, that is, the class 
of all monotone non-decreasing functions F(x) for which F(—«) = QO, 
F(+o) = 1, and F(z) = F(z,+ 0). For every F €% we may conceive of a 
corresponding random variable X such that Pr{X < x} = F(x). For some 
purposes we may desire to rule out the class 2 of degenerate c.d.f’s given by the 
formula F(z) = O for x < 2, F(x) = 1 for x > x, where % is any real number. 
Let then Q, be the class of non-degenerate c.d.f’s,Q,; = % — 2. Let % be the class 
of all continuous F(x), and let Q3 be the class of all absolutely continuous F(z), 
that is, all F(x) for which there exists a probability density function (p.d.f.) 
f(x) such that 


(1) Fa) = [sae 


Finally, let 24 be the class of all F(x) which may be expressed in the form (1) with 
f(x) continuous. 

Various solutions of non-parametric problems have been given under the 
restriction that the c.d.f’s belong to one of the classes Q;. For example, Kol- 
mogoroff [2] has indicated how a confidence belt for an unknown F may be 
formed with no assumptions on F, that is F ¢%. Wald and Wolfowitz earlier’ 
gave a more general solution of the same problem [5], and also of the problem 
of two samples [6], under the restriction that the c.d.f’s are members of % . 
The latter problem was considered by Dixon [1] for the c.d.f’s in Q;. Wilks’ 
theory of tolerance intervals [7] assumes F ¢2,. The class 2; has been defined 
above because it is ordinarily the largest class of statistical interest. We note 


(2) &® 2h Da De Dh. 


1See, however, a still earlier paper by Kolmogoroff [11] in which he gave the distri- 
bution theory required for his solution. 


227 





228 HENRY SCHEFFE 


It is to be understood throughout that the word “region” (also the symbol w) 
always denotes a Borel set in a k-dimensional (k > 1) sample space W (Euclidean). 
A “null set” will always mean a Borel set of measure zero. 

Returning now to the problem of two samples, let m + n = k, X; = Y; 
(¢ = 1, 2,---,m), Xi = Zim (i = m+ 1,---,k). Denote by E the point 
(Xi, +--+: , Xx). Proceeding along the lines of the usual parametric theory, 
we may seek a region w (the “critical region”) such that Pr{E ew} is the same 
constant a (“significance level’; a ~ 0 or 1) for all F in a particular class Q; 
if F = G. This raises the following question: Define 


P(w| F) = [ aruc, +++, Oe), 


F,(a1 , 72+, Op) — IT F(x. 


We shall say that a region w has the property 7; if for all F «Q;, a = P(w| F) 
is independent of F and 0 < a <1. The question then is, for a fixed 7, how 
can we characterize regions w with the property 7,;? Partial answers to this 
question are given in the next section. 

In the language of measure theory the question is this: Let » be any measure 
on the real line, such that the measure of the whole line is unity, and form the 
“power” measure y* in Euclidean k-space—that is, the product measure obtained 
by using » on each axis. For certain large classes C’; (corresponding to the Q; 
defined above, 7 = 1, 2, 3, 4) of measures un, what can we say about the existence 
and structure of sets of points in the k-space which have the property that 
their “power” measure is the same for all measures yu in C;? 


2. Theorems. Our first theorem tells us that if we want regions w with the 
desired property, we must restrict F to a smaller class than 2, . 

THEOREM 1: There is no w with the property ™ . 

To prove the theorem, suppose the contrary. Then there exists a w for which 
P(w| F) = aforallF eQanda¥Oorl. Let Lbethe line xz, = 2= --- =%, 
and suppose first there is a point Ey) of Lin w. Let Ey = (a, a,---, a), and 
let F(z) be any F €Q; such that Pr{X = a|F,} = h 0 <h <1). Then 


a = P(w| Fs) > P(Eo| Fx) = Pr{all X; = a| Fi} 


k 
= JJ Pr{X,; = a| Fi} = 2h. 
j=1 


By hypothesis a is independent of h. But h may be chosen arbitrarily close to 1. 
Hence a = 1, a contradiction. If no points of w lie on L, the above reasoning 
applies to w’ = W — w, since a’ = P(w’ | F) = 1 — ais independent of F €Q,, 
and w’ contains an Ey on L, therefore a’ = 1, a = 0. 








ned 


nce 
hat 


the 


Lich 
Xk y 
and 
hen 


0 1, 
ning 
OQ), 








NON-PARAMETRIC TESTS 229 





In order to see what kind of structure might yield a w of the desired type, 
let us for the moment consider the class Q; of ¢.d.f’s. Then there exists a v.df. 
over W, namely f(x1)f(a2) --- f(zx). For any f(x) and any point’ E, this p.df. 
has the same value at all points E’ whose coordinates are permutations of the 
coordinates of E. This suggests that suitable regions w can be built up by 
considering points EF for which no two coordinates are equal and putting a fixed 
fraction of the set {E’} in w in such a way that w is a Borel set. Our next 
theorem justifies this process for the wider class 22 . 

Let us say that w has the structure S if for every point E == (a, +--+ , xx) with 
no two coordinates equal, M points (0 < M < k!) of the set {E’}, obtained by 
permuting the coordinates of E, are in w and the remaining k! — M are not.’ 

THEOREM 2: A sufficient condition that w have the property m2 is that it have the 
structure S. 

In proving the theorem it will be convenient to separate the k! points of 
every set {E’} by means of regions u; (¢ = 1, --- , k!), such that each wu; contains 
one and only one point of {H’}. Order the k! permutations of the integers 
1,2, --- , Kin any manner so that (1, 2, --- , k) is the first. Let (pa, --- , pix) 
be the 7th permutation (¢ = 1, 2,---, k!) and define u; as the region xy,, < 
Lpjp < *** < 2p,,- The collection {u;} is disjoint and covers all of W except 
the set H of points on hyperplanes x; = x; (t # j). The transformation 7; : 
Lp; 721, °** , Lp,, — Te Maps u; Onto uw in such a way that F;, remains in- 
variant. 

Suppose now that w satisfies the conditions of the theorem. The removal 
of HN w from w does not* affect P(w | F) for any F «2. Hence 


k! k! 
> P(wNu;|F) = > | dF, 
i=l] wns 


i=l 


P(w | F) 


k! 
=D J conss(E) oF 


where cg(E) denotes the characteristic function of a set S, that is, cg(E) = 1 
if Ee S,0 otherwise. Next map each of the regions u; onto u; by means of 7; . 
F, is invariant, while ¢uqu,(E) — h(E) such that >~', h(E) = M for Feu. 
Then 


k! k! 
P(w| F) = >> / h(E) dF; = | > h(E) dF, = M / dF,. 


i=l ) i=l 


* Previously E denoted a random point (Xi, --- , Xz), now it denotes an arbitrary point 
(%1, --- , 2) in the sample space W. This will cause no confusion. 

* Regions of structure S may be regarded as the result of applying R. A. Fisher’s 
randomization process [10] in the most general possible way to the problem of two samples. 
Special cases of regions with structure S have been considered by Feller [9] and Neyman 
(12{, and are implied by all writers [e.g., 6] who have attacked the problem of two samples 
by the method of ranks. 

‘This may be seen by writing P(H | F) in the form of an integral over W of cy(E) dF; , 
where cy(E) is the characteristic function of the set H, and applying the Fubini theorem [4]. 


HENRY SCHEFFE 


1=PWIr)=>f ar, 


t-_ 


and by use of 7; we find 


[ a= | dF, 
Us uy) 


/ dF, = 1/k!, 
and 
P(w|F) = M/k! 


for all F ¢Q. Thus w has the property m. 

H is an example of a set :n the class N2 of regions w for which P(w| F) = 0 
for all F «Q. Since if regions w, and w, differ by a set we N2, P(wi| F) = 
P(we | F) for all F «& , we have 

Corouiary 1: It is sufficient that w have the property m2 if it differs from a region 
with structure S by a region in Ne. 

Defining similarly the class N; as that class of regions w for which P(w | F) =0 
for all F €Q; , we see that N;3 is precisely the class of null sets. 

Coro.uary 2: A sufficient condition that w have the property 73 is that it have 
the structure S except for a null set. 

The mildest restriction under which the writer has been able to concoct a 
necessity proof is that the boundary of w be a null set. This class of regions w 
includes (to the best of his knowledge) all critical regions heretofore used in 
practice. 

THEOREM 3: For a w whose boundary is a null set, a necessary condition that 
w lwve the property m4 is that it have the structure S except on a null set. 

Suppose then that w has the property m,, and its boundary B is a null set. 
Let B; be the transform of B under 7;. Let the null set H’ be the union of H 
with all B; and let w, = w — H’, w. = (W — w) — H’. Then w, and we are open 
sets and P(w,|F) = P(w|F) for all Fe. Furthermore for any E either 
all or none of the points of {EZ’} are in w,Uw.. Now consider any Ep eu; 
and let My be the number of points of {Eo} in w;, so that k! — Mo of {Eo} are 
in w.. Let Eo = (&, +--+, &), and 26, = min |é; — é;| for? # 7. Since wy 
and w, are open, cubes with sides parallel to the coordinate hyperplanes (x; = 
constant) and edges of length 25. may be centered on the points Eo so that each 
cube is entirely in w; or entirely in w2 , by choosing 6 sufficiently small. Choose 
5 so that 5 > 0,5 < 6,5 < 6. The set {Eo} is a subset of the set {Eo} of 
k* points whose coordinates are in the set £, --- , & allowing repetitions. For 
each point Eo = (&,,°°:, &,) in {Eo} construct a cube C;,,--..4, &8 above 





NON-PARAMETRIC TESTS 231 
with center at Eo and edge 25. These cubes are disjoint. Let f;(x) be a p.df. 


such that the corresponding c.d.f. is in Q and f(x) = Ofor |x — &| >6 (i = 1, 
.-+,k). Define the p.df. 


f(z) = 3” 2 fix) 
Then the corresponding c.d.f. F” is in 2,. We have 


P(w|F®) = / 11s) dW 


= s* | ; x fi,(a1) +++ flax) AW, 
bao -chgenh 


where dW = dz, --- da,. Bring the last summation sign outside the integral 
sign, and note that f;,(71) --- fi,(ae) = 0 outside C;,,,...,;,.. Then 


Diyos = | fi(as) +++ foglaa) AW. 
wncs;,,- yap 


Our argument depends on certain sums of /;,,...,;, having the property that 
the sum is equal to a times the number of terms in the sum. In order to save 
space we shall say that if 2 is suth a sum, then = ¢ R, R being the class of such 
sums. Clearly all sums (3) arein R. Let {S,,} be the subsets of r (r = 1, ---, 
k) different integers in the set 1, 2,--- ,k (v = 1, -+- , C,), and let 2,, be the 
sum of aJl J;,,...,;, for which the index 7,, --- , 7% consists only of integers in 
S,, and such that all the integers of S,, appear in the index. We wish to prove 
that Dx, the sum of J for cubes centered on the points of {Eo}, isin R. To ac- 
complish this we make an induction on r: If we assume all 2,, e R for r < s, then 
we can show all 2,,¢R (s = 2,---,k). No generality is lost in taking S,, as 
the set of integers 1, 2,--- ,s. Now consider the left member of (3). Some 
thought will show’ that it may be broken down into &,, plus a sum of =,, where 
r<s. But the left member of (3) is in R, and by hypothesis so are all 2,, with 
r<s. It follows that =,,isalsoinR. Tosee that 2,¢€R(v = 1,---, k), let 


5 To illustrate the reasoning, suppose s = 4. If Sg, is the set of (different) integers a, 
b, --- ,h, denote Se by <a,b, --- ,h>, that is, <a,b, --- ,h> is the sum of all J whose 
indices contain a, b, --- ,h and no other integers. Then the right member of (3) contains 
tenme from <1, 2.3, 4>; <1, 2,3>, <1, 2, 4>, <1, 3, 4>, <2, 3, 4>; <1, 2>, <1, 83, 
<1,4>, <2,3>, <2,4>, <3,4>; <1>, <2>, <3>, <4>. Every term of the right 
member of (3) is in one of these sums < >. No term can appear in 2sums < >. Every 
term of each sum < > appears in the right member of (3). Thus the right member is the 
sum of all sums < > listed above, and by hypothesis, all but the first sum < > arein R. 


pee ee PERE 


[| 2-2 + Oe te ee ee ee ae eee ere ee 





232 HENRY SCHEFFE 


S,, be v and note that >), consists only of J,,,...,.. Putting s = 1 in (3) we 
have J;,,...1 = a, and likewise >), = J,,,,.... = a. Thus 2,€R. 

We have at this stage that Z,, = k!a. But as we already noted, of the cubes 
C associated with the integrals J in the sum 2, , Mo are entirely inside w, and 
k! — M, entirely outside w,. For the set of Mo terms in 2, corresponding to 
the cubes C in w, the region of integration wf C in (4) is actually C, and for the 
remaining set of terms in >,; the region of integration is the empty set. Further- 
more if wf C = C in (4), the corresponding J is unity. Hence Si: = Mo = kla, 
a = M,/k!. If we now repeated the process with any other point F, € w, in- 
stead of Ey, and let M, be the number of points of {£1} in w,, we would get 
a = M,/k!. ThereforeM,= M,. From0 < a < 1, weconclude0 < Mp < ki. 
‘Thus w, has the structure S. 

The exceptional null set allowed for in the statement of Theorem 3 entered 
the proof when we removed wf H’ from w. Had we assumed that the boundary 
B « Nz, then the exceptional set would be in Nz. Asa corollary to the reasoning 
used in the proof we thus get 

Coro.uuary 3: If the boundary of w is in Nz , a necessary condition that w have 
the property 34 is that w have the structure S except on a subset in Ne. 

Finally, because of (2), any sufficient (necessary) condition for w to have the 
property 7; is sufficient (necessary) for w to have the property 7; if j > 7 (7 <2). 
Hence we may replace m2. in Theorem 2 and Corollary 1 by 73 or 74, 23 in Corol- 
lary 2 by 24, 74in Theorem 3 and Corollary 3 by 23; or m2. This yields 

Coro.uary 4: If the boundary of w is a null set, a necessary and sufficient condi- 
tion that w have the property 3 (or 4) is that it have the structure S except ona 
null set. 

Coro.uary 5: If the boundary of w is a region in N2 , a necessary and sufficient 
condition that w have the property 2 (or 13 or m4) is that it have the structure S except 
on a subset in Ne. 


3. Remarks. Wald and Wolfowitz (6, 8] in their work on the problem of two 
samples for the case F ¢ 2, have imposed the following restriction on any statistic 
used to test the null hypothesis: The statistic must be a function of V only, 
where the sequence V of k elements is formed as follows: Rank the X; of the 
sample in ascending order of magnitude (ignoring cases where two X ; are equal), 
and if the 7-th element in this rank order is a Y put the 7-th element of V equal 
to zero, else unity. This means that the resulting critical region always consists 
of the union of s of the regions wu; defined in section 2, where s is a multiple of 
m'n!.. The results of our section 2 show that this restriction is not necessary if 
all we require is that Pr{E « w}, where w is the critical region and E the sample 
point, be the same constant a whenever the null hypothesis is true. In fact a 
valid (but probably not very efficient) solution of the problem of two samples 
has been proposed by Pitman [3] in which the statistic is not a function of V only. 

Putting further requirements on the critical region will lead to a more restricted 
class than the class of regions having essentially the structure S. For instance, 





NON-PARAMETRIC TESTS 233 


from section 2 it follows that the significance level a can be any of the values 
i/k!(@ = 1,-++, k! — 1). But if we lay down a symmetry condition to the 
effect that if (y1, --+ ,Ym,21, °** ,2n) isin wu, all points obtainable by permuting 
the y’s among themselves and the z’s among themselves be in w, then a must be 
a multiple of m!n!/k!. Again, if we impose the condition that any statistic 
T(X1, °°: , Xx) used to test the null hypothesis remain invariant when all the 
X, are subjected to the same topological transformation of the real line onto it- 
self, then Wald and Wolfowitz [6] have shown that T must be a function of V 
only, so that w has the special structure described above. It would seem de- 
sirable when the subject of statistical inference in the non-parametric case may 
be entering a stage of rapid development, to be clear about the assumptions 
necessary to restrict the critical region to a particular class. 

In concluding these remarks, we quote with the kind permission of Dr. Wolfo- 
witz, from some correspondence with the writer. Important work has been done 
on non-parametric tests under the restriction that the statistic used be invariant 
under topological transformation. The following statement as to why this re- 
striction might be imposed will therefore interest the reader: “ --- there are 
arguments pro and con --- Pro: If the statistic be not invariant, this could 
happen: Two scientists working on the same problem and having the same 
observations to interpret might come to opposite conclusions if one used one 
scale of measurement and the other used a monotone function of that scale. 
Con: The criterion of topologic invariance of the statistic is a restriction on our 


freedom. Furthermore it cannot be imposed except in the univariate case 
((8], p. 270).” 


REFERENCES 

{1] W. J. Drxon, Annals of Math. Stat., Vol. 11 (1940), pp. 199-204. 

[2] A. Kotmocororr, Annals of Math. Stat., Vol. 12 (1941), pp. 461-463. 

[3] E. J. G. Pitman, J. Roy. Stat. Soc. Suppl., Vol. 4 (1937), pp. 117-130. 

[4] S. Saxs, Theory of the Integral, Warsaw, 1937. ‘ 

[5] A. WaLp and J. WoLrow1tTz, Annals of Math. Stat., Vol. 10 (1939), pp. 105-118. 

[6] A. Wap and J. Wo_row1Tz, Annals of Math. Stat., Vol. 11 (1940), pp. 147-162. 

[7] S.S. Witks, Annals of Math. Stat., Vol. 12 (1941), pp. 91-96. 

[8] J. WoLFow1Tz, Annals of Math. Stat., Vol. 13 (1942), pp. 247-279. 

[9] W. Fexier, Stat. Res. Mem., Vol. 2 (1938), pp. 107-125. 

[10] R. A. FisHer, Statistical Methods for Research Workers, section 24, example 19; The 
Design of Experiments, section 21; J. Roy. Anthrop. Soc., Vol. 66 (1936), pp. 
57-63. 

[11] A. Kotmocororr, Gior. Ist. Ital. Attuari, Vol. 4 (1933), pp. 83-91. 

(12[ J. Neyvman, J. Roy. Stat. Soc., Vol. 105 (1942), pp. 311-312. 













FURTHER RESULTS ON PROBABILITIES OF A FINITE NUMBER 
OF EVENTS 


By Kar Lat Cuune 
Tsing Hua University, Kunming, China 





In a recent paper’ the author has generalized some inequalities of Fréchet to 
the following: 


Let n = a = m 2 1, and let 
-1 
n—-m (m) _ 4 (m) 
c ae > Fe ((v)) aes Ag . 


AF (a) F(a) — F(a+1), A’F(a) = A(d*"'F(a)); 


then 
AAS” = 0, AAS” > 0. 


Using a generalized Poincaré’s formula, P. L. Hsu has improved these inequali. 
ties to the recurrence formula stated below. 
Hsu’s formula is 


Bi (1) AA = —™_ git), 
n—m 
| Proor: We have 

C b- 1 
pa((a)) = 3 (-1" (57 ) ssa). 
h beam ™ - 1 
' For a fixed ‘‘a” summing over all (a) ¢ (v), 


| S pa((a)) = Xe (= 1" (r- Na i ) _— 


(a)e(» P 
1 (Bere eaiat) ww 
: aa = IME [G=m) 
é (FR VIGIL) s@n- co 
°F So) 
. Ee : ') .. (-1)" i. — A - i) Ss((v)) 
m 


=" Asi”, QED. 



























1“‘On the probability of the occurrence of at least m events among n arbitrary events,” 
Annals of Math. Stat., Vol. 12 (1941), pp. 328-338. We use throughout the same notation 
used in this paper, and that referred to in footnote 3. 


234 


vents,” 
otation 


PROBABILITIES 


Applying the formula repeatedly, we obtain forO < h < n — 


is a+m—1\/n— m\" im 
wage = (m= mm)" ge 


Since every A 2 0, we have, forO Sh Sn —<a, 
A*AM™ = 0, 


which includes my former results. 
Further, we may write (1) as 


(2) (n — a)PS™ = (a +1 — m)PS + mPSY 
or 
(a + 1)Pcti — (n — a)PQ™ = m (Poth — Popt”) = mP tl 
It follows that 
(3) (a + 1)Pcti — (n — a)PQ™ 2 0. 
From (2) it also follows that 
(4) (n — a)P.™ — (a+ 1 — m)P2hi 2 0, 
which is the same as AAS” = 0. Combining (3) and (4) we obtain 
Spire srs tet 


If we take the special case 7, = 1 and instead of the original events Fi, --- , E, 
consider their negations, we easily obtain 


=54(()~ sun} « 2) - son «2244(3)~ sn} 


This is equivalent to a result given by Fréchet’. 
There is an analogue of Hsu’s formula for Pim) , as follows: 
Letn = a = m 2 1, and let 


( = a = Bi" ‘ 
e=~m 


ABI™! = =t° pint), 


It follows that forO0 Sh S n — aa, 


h\f{n — m\ 
sina (MEN mae 
a m h a ’ 


A*B'* = 0. 


?““Evénements compatibles et probabilités fictives,”” C. R. Acad. Sc., Vol. 208 (1939). 





236 KAI LAI CHUNG 


The other results on pm in the paper! also have analogues for pjmj . For the 
result on conditions of existence see the author’s recent paper’. Here we shall 
state the following extension of Boole’s inequality. 

For 21 + 1 < n — aand 21 S n — a respectively, we have 


21 ° 
-1 ("3 . " Smes((%)) S Pom(()) S$ Qe (-1) e 7 ‘) Sm4i((2)). 


Proor: We have 


Smi((o)) = (” 7 Pim (()). 
Hence, 


Ey ("t*) senton = Hd (Y(t) prman(y 


i=0 h= i= 


= ploy) + (IE ")S 1 (7) pra 


= ptm ((v)) + ("* - " (— 1)’ r x ') P{m+n}((v)). 


The inequalities follow immediately. 

Finally, we record two formulas which express p,((v)) in terms of P3”'((v)) 
and in terms of P}”!((v)) for a fixed m and ranging b’s. Formulas which express 
Pa\((v)) in both ways have been given’. 

We have, 


ia 1) rm) = D(-y™ YS pal(6)) 
b=m (8) ¢ (7) 


1) 0) = OS DY pall) 
(B) € (7) 


(v) b=m 


=Scevn(2Tt) may 


(B) € (») 


b 
S.((v)) = .- 4 - | —: Pe, 


By a generalized Poincaré’s formula, we get 


pao) = De Se ( 


c=max (a,b) 


= n—a+b—m b = a a 


*“‘On fundamental systems of probabilities of a finite number of events,’ Annals of 
Math. Stat., Vol. 14 (1943), pp. 123-134. 





r the 


shall | 


PROBABILITIES 
Similarly we have 
\-l ¢ 
__ -S __1\t—™ n—b [m] 
sao =(S) Xun (2b) wh 


peo) = Hewrnd Soe (Er (RTE) ber 


It remains to be seen whether the series in the curl brackets can be summed. 
Using a formula in footnote 3, we may obtain the desired formula in another 
way. We have, in fact, 


p(0)) = Z Pall) 
ES cpr) (e) pion 


c=a b=m+n—c n= m 


“Fe (Sor ")(e) porn 


+ {= cy (P (2) \ PI"(()). 


The “‘complete”’ series 


Sor 6 


The “incomplete” series we denote by 


n oe b ace =§ n—a 
K(n, a,b, m) = Do (-1) (* “ (<) = 2 (-1) 
Then we may write 


Da((v)) = i =( _ 7 Py” + . (—1)"" K(n, a, b, m)P}”. 


b=m+n—at+1 





ON THE PROBLEM OF TESTING HYPOTHESES 
By R. v. MIssEs 


Harvard University 


1. Introduction. The following is known as the problem of testing a simple 
statistical hypothesis. The probability distribution of a variate X depends on 
a parameter 3. In the course of experiments each time a value x of X is observed, 
one pronounces one of the two assertions: ‘‘ equals 3” or “d is different from 
% .”’ The first assertion is made when the observed value z falls in a “‘region of 
acceptance” A, the second, if x falls in the complementary region A. What is 
the chance of these assertions being correct and how can A be chosen to make 
this chance as high as possible? 

The distribution for the variate X is considered as given. Let P(x | #) be 
the probability of the value of X being < z. It is obvious that to know P(z | 8) 
is not sufficient for computing the success or error chances of the above assertions. 
There is another distribution function Po(#%) involved which we may call the 
initial or the a priori or the over-all distribution of the parameter 3. The 
meaning of Po(?) is as follows. In the infinite sequence of trials there will be 
among the first N experiences N,; cases where the assertion that the parameter 
value is < #8 proves correct. Then Po(#%) is the limit of the ratio Ni/N when N 
tends to infinity. If No is the number of cases in which the actually pronounced 
assertions 3 = dp or 3 ¥ J respectively, prove correct, the limit of No/N is the 
success chance and of 1 — No/N the error chance of the test under consideration. 
It would not make any sense to assume that an error chance exists but the over- 
all chance Po(#) does not.’ 

The success and error chances for the assertions 3 = 3 and 3 ¥ J depend on 
both functions P(z | #) and Po(#). But in most practical cases nothing or very 
little is known about the parameter distribution. Usually, only the limits 
within which @ varies are known, or a set of distinct values is given which 3 
can assume. Therefore, the problem of testing a hypothesis must be modified 
in the following way. We ask: What can be said about the error and success chances 
of the two alternative assertions and about the choice of the region of acceptance, if 
P,(8) is entirely or partly unknown? This form of the question corresponds 
more or less to the conception generally adopted today. 

In section 4 of this paper a complete answer to the question is presented for 
the case of a parameter distribution that is entirely unknown except for the range 
of possible 3-values. This solution, with the restriction to a parameter assuming 
distinct values only, was already given by Robert W. B. Jackson in a paper 
devoted mainly to some genetical problems [1]. The particular circumstances 
prevailing under the restriction to distinct parameter values will be discussed 


1 The expression ‘‘chance”’ rather than “probability” is used here since no randomness 
ig required. Cf. the author’s paper [2] p. 157. 


238 





TESTING HYPOTHESES 239 


in section 8. In section 6 the result is extended to composite hypotheses and 
in section 7 to problems in several dimensions. An important case of restrictions 
imposed to Po(#) is discussed in section 9. 

In the preceding lines the subject of testing a statistical hypothesis was pre- 
sented in its simplest form, with one scalar variate and one parameter, in order 
to discard all non-essential complications which would serve only to veil the 
principal point. For the same reason it is to be understood, in the following 
text, that region (in one dimension) will mean an interval or a finite number of 
intervals, and distribution will mean a set of concentrated values at distinct 
points with a continuous density in between or a continuous density throughout. 
If, for the sake of brevity, a Stieltjes integral is used, nothing else is meant than 
the combination of a sum and an ordinary integral of a continuous function. 
With respect to the parameter # the distributions P(x | #) are considered as 
either defined for distinct 3-values only or as continuous functions, ete. 


2. Error chance. Success rate. J. Neyman who must be credited with 
successfully promoting many probléms of mathematical statistics introduced 
the distinction between errors of first and second type and made this the basis 
of his approach in dealing with the theory of tests. An error of first kind is 
committed if the assertion 3? ¥ J} is made when # equals % ; an error of second 
kind occurs when the assertion 38 = & proves incorrect.” The chances P; and 
P,, of these two events can easily be computed, if the distributions P(x | #) and 
P,(#) are considered as known. From P(z | 8) we derive the probability P(A | 8) 
for x falling in the region A. In particular P(A | 3) will be designated by 
1—a. Thus ais the probability of z falling in A when 8 = %. The function 
P,(#) can have, at the point ¢ = 3%, a jump of magnitude m. The set of all 
d-values except % will be called H. Then the two error chances are obviously 


(1) P,; = ato Pu= |. P(A | 8) dPo(8). 


By the integral over H is meant that the term P(A | >) in the summation has 
to be omitted. The formulae (1) show anew that it would be senseless to speak 
of error chances without assuming that an over-all distribution Po(#) exists. 

In all papers that follow Neyman’s line of thought first and second type 
error chances are discussed. But the formulae (1) are seldom written down.’ 
It is incorrect to say that a is the chance of a first type error and it is likewise 
incorrect to say that the chance of a second type error depends on #; it depends 
on the distribution of #. 

The total error chance is 


(2) a ee I _ P(A | 9) dPa(®) 


2 See e.g. ref. [4], [5] or various other publications by the same author. 
? They are included e.g. in equation (1) of A. Wald’s paper [5]. 





240 R. v. MISES 


and 1 — Pz is the success chance. If the distribution P(z | 8), the region of 
acceptance A, and the test value & are given, Pz depends on P(8) only. If we 
make P,(#) coincide successively with all functions not excluded by some 
preliminary knowledge about the over-all distribution, there must exist a definite 
least upper bound (l.u.b.) of Pg since Pg has the upper bound 1. The value 


S =1— Lub. Pz 


is the greatest lower bound of the success chance. In other words, for any 
positive ¢ there exists a Po(#) for which the success chance is S + «¢ and S js 
the greatest number for which this holds true. We therefore call S the sure 
success rate or, briefly, the success rate for the test under consideration. If the 
success rate S’ for a region of acceptance A’ is greater than S, the test using 
A’ will be briefly called preferable to that using A. 

Neyman’s approach consists in comparing two regions A and A’ with the 


same a. The difference of the respective error chances Pz and P’s-is according 
to (2): 


* 
(3) i iil I _, (P(A | 8) — P(A’| 8)] dP) 


This difference is non-negative, whatever is taken for Po(#), if for all values of # 
(4) P(A |) = P(A’ | 8). 
In this case Pe = Py; and l.u.b. Pg = l.u.b. Pz and therefore S < S’. Ifa 


region A’ can be found for which (4) holds for whatever A, Neyman calls the 
test using A’ a most powerful test. In fact, this test has at least as large a success 
rate as any other test using a region of acceptance with the same a. Neyman 
does not use the concept of success rate as introduced here, but implicitly the 
success chance is the criterion underlying his analysis of tests.‘ 

The theory of most powerful tests would supply a complete solution of our 
problem, if (1) a most powerful test existed in all cases, i.e. for all distributions 
P(x | #) and all % ; and if (2) a sufficient indication how to chose a were given. 
Unfortunately it turns out that in almost no practical case a region A’ of this 
kind can be found. The various substitutes for a most powerful test as proposed 
by Neyman and others (unbiased test, test of type A, etc.) need not be discussed 
here, since it is obvious that nothing can be said about the difference S — S’, 
if (4) is not fullfilled for all A and 3. As to the choice of a, the expression 


4 This can be seen e.g. from the justification of most powerful tests as given by A. Wald 
[7] p. 15-16. Moreover, the recommendation of a test with highest success rate as the 
‘“‘best’’ (which is not the purpose of the present paper) could be justified from the stand- 
point of the general theory developed by Wald [6]. Wald introduces an arbitrary weight 
function for defining a “‘best”’ test. If the error weight is taken as one in the case of a false 
answer and as zero for each correct answer, Wald’s ‘“‘best’’ test coincides with the test of 
highest success rate. The present paper includes only statements that refer to the actual 
numbers of correct and false answers, independently of any arbitrary assumption about 
an error weight. 





TESTING HYPOTHESES 241 


“level of significance” used by Neyman, leaves it open whether a high or a low 
value of a is preferable. 


3. Preliminary example. Before attacking the general problem the discussion 
of a very simple example may provide some information. Let the distribution 


of the variate X be given by the density 
(5) p(x| 8) = 1+ # (2? — 4), O<2x<1. 


It is immediately seen that the integral of p over the interval 0 to 1 equals 1 for 
each # and that p 2 0, if # lies in the limits —+/3, +/3. Let this be the only 
information we possess about the over-all distribution P)(8). The value to be 
tested may be % = 0. The density for this parameter value reduces to 
p(x|0) = 1 and thus the probability of x falling within the interval x, , 22 
equals 2 — 2, if } = %. According to the notation introduced above we 
may consider as intervals of acceptance A all intervals with the limits 2, 
1 +1 — a, whereO S % S a. 
The function P(A | #) is now given by 


P(A|9) = [ ” gle ae 
(6) ’ 


=1- a+ (= a)s'|ai + m1 - a) - P= 9], 


In particular, for the interval A’ between 0 and 1 — a: 
(7) P(A’|9) =1-a- (1 - a SP = 8 


The difference of these two expressions is non-negative: 
(8) P(A | 8) — P(A’|8) = (1 — a)d’a(%, + 1 — a) 


Thus the interval 0, 1 — ais seen to be a most powerful one. The error chance 
of this test is according to (2): 


2 a(2 — (2 — a) 
an + [[1-«- sa - a) — ——— Jara) 


wat i-a«- @« a) 22 — 9) ” | 9” dP(9). 


The last integral is non-negative and can approach zero indefinitely since the 
total amount 1 — 2 can be concentrated at a point 3 ¥ 0 with 8 < «. There- 
fore the l.u.b. of Ps for given a and 7 is 

ato + (1 -_ a)(1 — ™o) 


On the other hand, this is a linear function of +» which takes its extreme values 
at the ends of its interval, x» = 0 and #, = 1. Thus the larger of the two values 


| 
3 
2 
a 
3 
: 


LILGLFLiWWits Vi 





242 R. v. MISES 


a and 1 — ais the l.u.b. of Pz, if Po(8) is subjected to no further restriction, 
The success rate of the test under consideration is accordingly the smaller of the 
two quantities a and 1 — a. 

For a = 0.99 or a = 0.01 the success rate is 0.01. This means: If we use the 
most powerful test at a level of significance of either 99% or 1%, we risk in both 
cases that 99% of all assertions will be false. If a = 4, the success rate reaches 
its maximum value which is 3} too. On the other hand it can be seen that each 
interval of length 4 with not too large xz, would lead to the same success rate, 
In fact, the error chance P, for the interval x, , x, + 1 — a is according to (9) 
and (6) 


Pg = am + (1 — a)(1 — mo) 


/ 

- -(-«) E — nl +1- «| [_ # apy). 

3 (#) 
Therefore, the same reasoning as before applies, if the factor in brackets is non- 
negative. This is the case for a = 3 if the interval begins at a point 
zy = 3(\/5 — 1) = 0.309. Among these intervals, that with z, = 0 can be 
considered as preferable since its success chance for any Po(#) is at least as high 
as that of any other interval. 

Now, let us assume that in the definition (5) of P(x | 8) the factor #” is replaced 
by some function g(#) which takes positive and negative values (within —3/2 
and 3) while # varies from —+/3 to +/3. Thenejquation (6) shows that for 
any two intervals of acceptance A and A’ the difference P(A | #) — P(A’ | 9) 
changes its sign at least once with varying 3. Thus no most powerful test in- 
terval exists. But, applying (9) and calling g, the (negative) minimum value 
of g(#) we find now 


at ~ ot ~ a = Te «) | — 


as the l.u.b. of the error chance of A’ for given aand z. Thus the smaller of the 
quantities 


1—a and 1- a= a)[1- 92259] 


is the success rate of the test using A’. If g, is given we can find, by differentia- 
tion the value supplying the- highest success rate. Using (9’) instead of (9) 
we find in a similar way the success rates for any other interval. It turns out 
that S = 3 for the interval extending from the above given value x, = 0.309 to 
0.809. 

There are three things we may learn from this example. (1) It can happen 
that a most powerful test, at a high or at a low level of significance, has an 
extremely poor success rate; (2) In the case where a most powerful test with 
the highest possible success rate exists, there may be other intervals with the 
same success rate; (3) If no most powerful test exists, there is no need to look 














on- 
int 


igh 


ced 
3/2 
for 
| ¥) 
, in- 
alue 


‘ 7) 


f the 


ntia- 
f (9) 
s out 
09 to 


yppen 
as an 
with 
h the 















TESTING HYPOTHESES 243 


for some substitute definition; the success rate for any kind of test can be found 
independently of its being most powerful or not. 


4, General solution for a simple hypothesis. The distribution P(x | #) of the 
variate X, the parameter value J to be tested, and the set of all possible values 
of # are supposed to be given. The set of all possible 3-values except J is called 
H. Choose a region of acceptance A and compute first, for all 3, the magnitude 


(10) P(A|9) = I. dP(z | 8). 


In particular, the value_of this integral for # = vp will be called 1 — a@ and its 
maximum value or its l.u.b. on H will be denoted by 8: 
(11) P(A | %) = 1 — a, 


The chance of committing an error in asserting # = J when x falls in A or 
§ ~ % in the case z falls in the complement A is according to (2) 






lu.b.g P(A | 8) = 8B. 


Pe= amt [_ P(A| 9) dP0(d), 
(H) 


where 7 is the jump of Po(#) at the abscissa # = &% , or the a priori chance of 
%. The domain of integration over H is (1 — 2m) and therefore B(1 — 7) 
the l.u.b. of the integral. Thus’ 


lu.b. Pe = max {am + B(1 — )}. 


As m can take all values between zero and one, the lowest upper bound of Pz 
is either a or 8. The success rate S, i.e. the greatest lower bound of 1 — P;, 
is consequently the smaller of the quantities 1 — a andl — 8. 

If the distribution P(x | 8) is given and a region of acceptance A for a test value 
3) chosen, the success rate of this test equals the smaller of the two quantities 


(12) l1—a=P(A|&) and 1—68=1—Lu.b.H P(A | 9), 








if nothing is known about the initial distribution of the parameter except its range. 
Finding a région of acceptance, A, with the highest success rate, is then a simple 
maximum-minimum problem. 

This solution is not restricted to some rarely occurring type of distributions 
P(x | 8) and it is insofar a complete one as it does not leave undetermined the 
value of a. Using Neyman’s terminology we would have to say: The success 
rate is the smaller of the two quantities: 1 minus level of significance and mini- 
mum power of the test. 

It follows from the definitions (12) that, if P(A | 8) is continuous in a v- 


* This formula was given by Jackson [1] p. 148 for the ‘‘case when the set of alternatives 
isdiscontinuous”’. Jackson calls the test with highest success rate a “most stringent test’’ 




























SVITRSL ITAA A ed NT a 


















































244 R. v. MISES 


interval including W% , and # is allowed to take all values of this interval, 8 cannot 
be smaller than 1 — a: 


B2zl—-aoat+sp2l. 


Thus 1 — a and 1 — £8 cannot possibly both be greater than 3. The greatest 
possible success rate is then } and it can be reached only if a = 8 = 3. Westate: 
No test can have a success rate S greater than 3, if 3 can vary in an interval including 
8) without any restriction and P(A | #) is a continuous function of 3 in this interval, 

We will see later, in sections 8 and 9, how certain restrictions imposed to 


P,(8) which are effective in some problems improve the success rate of a test. 


5. Examples. Let us assume that the variate X is normally distributed ac- 
cording to 


(13) P(z|8) = h(x — 9)], P(u) = a [ e* dz. 


The parameter value to be tested may be taken as % = 0 without loss of gene- 
rality, since in all other cases X — J can be considered as the variate. If the 
interval x; , 22 is chosen for the region of acceptance, we have 


(14) P(A | 3) = ofh(xe — 9)] — ofh(z — 9)). 
The right hand side becomes a maximum, if 
¢'[h(zz — 8)] = g'fh( — 9)), ie. F = 3(%1 + %). 
Therefore, for % = 0 
1—a= (har) — (hai), B= (h(z2 — 21)) — o(3h(%1 — 22)). 
Both quantities have the value 3, if and only if 
(15) “1 = —, o(hai) = 3, (hae) = F. 


These are the probable limits of x. The conclusion is that the probable limits 
supply the interval with the highest possible success rate S = 3. 

The result is not restricted to the particular form of the function ¢, it remains 
valid, if @ is replaced by any function whose derivative ¢’ has one maximum 
and decreases both ways symmetrically. It is well known that this test which 
has always been used by statisticians and is here proved to have the maximum 
success rate, is neither most powerful nor even, for a general ¢, unbiased. We 
also see that the interval determined by (15) is the only closed interval with 
maximum success rate. 

Our method supplies the analogous solution for the case of an unsymmetric 
distribution also. Assume the density 


(16) p(z| 8) = fz — 9), 





TESTING HYPOTHESES 245 


where f(u) is supposed to have only one maximum, say at the point u = 0. 
The value to be tested may again be chosen as % = 0. For the interval z; , x2 
as region of acceptance we have 


rq—0 


P(Ala) =| f@—s)de= | stu) du. 
Z z,-—0 
The last expression becomes a maximum with respect to #, if 


f(a — 8) = f(xe — BV). 


The maximum will occur at the point # = 0 and accordingly coincide with 1 — a, 
if f(a1) = f(z2). Thus we have a region of acceptance with the highest possible 
success rate 3, if 2, , 2 are determined by 


(17) [sed au=4, fla) = fea. 


Under the assumptions made for f(u) there exists exactly one pair of values 
1, %2 obeying these equations. This kind of test too has been much used by 
statisticians, but an account of its merits has so far not been given. 

Another example is supplied by the density function 


(18) p(x! 8) = #xe*, z20, 8>0. 


We derive for an interval x, 22 
x2 
P(A|#) = / p(x|3) dx = (8a + l)e?*! — (S22 + 1)e?**. 
If 3% is the value to be tested, we have 


(19) 1 — a = (dor, + 1) — (Sore + 1)". 


One may ask for an interval 2; , x2 with the success rate S = 4. Then equation 
(19) must be fulfilled with a = } and, moreover, P(A | #) must take its maximum 
value at #8 = %&. This provides the second condition 
(19’) aP(A | 9) =0 at =H, ie. xee°% = zie ™, 

ov 
There exists, for each 3 > 0, one and only one pair of values x; , x2 obeying the 
two equations (19) and (19’). 

In all these examples it turned out that at least one interval with the success 
rate S = 3 (the highest value for a distribution continuous with respect to #) 
exists. It seems that this is a common property of most usual distribution 
functions P(x |). But we can easily give an example where the greatest S, 
at least for a single interval as region of acceptance, is smaller than 3. Assume 


(0) P(rz|s) =2+d2e(1—2z)(2Q8r-1), OSzS1, —-1S0S81, 


uw 


j 
3 
ts 
* 


eure 
abea 





246 R. v. MISES 


and let % = 0 be the value subjected to testing. For any interval beginning 
at x« and extending to x + 1 — a we find 


P(A|8) =1—a+ads+ 8° with a= (1 — a)(2zx — a), 


(21) ‘ : 
b = 2(1 — a)(—32° + 38ar — a +a— 2). 


It is a necessary condition for a test with S = }—in the case ofa differentiable 
P(A | 3)—that the derivative of P(A | 8) vanishes at ? = 3. Thuswe must 
have 


OP(A| 9) 


= ad 


a+ 3b0°=0 for 3 =0. 
This shows that 2x — a must be zero or x = }. On the other hand, for a = }, 
x = } the formula for P(A | &) becomes 


P(A|9) = 4 + 0". 


Thus P has an inflexion point at ¢ = 0 and its maximum, 8, must be greater 
than 3. In the present example, as 3 goes up to 1, we have 8 = 11/16 and the 
success rate is S = 5/16. This does not exclude that intervals with a success 
rate between 5/16 and 3 exist. E.g. for x = 0.45 and a = 3 one finds the maxi- 
mum 6 = 0.60 and thus S = 0.40. The optimum interval can be found by dif- 
ferentiating the formula for P(A | #) with respect to x and a. 

Examples with the # restricted to distinct values will be discussed in section 8, 


6. Composite hypotheses. We have the problem of testing a composite 
hypothesis, if instead of one value % a region H of J-values is given and the 
assertions to be made in the course of experiments are “d belongs to H” or 
“3 does not belong to H.” The solution developed in section 4 applies to this 
case almost without modification. 

Again, let P(A | #) be the probability of x falling in the region of acceptance A. 
By A and H we denote the regions complementary to A in the sample space 
and to H in the d-space. Then the error chance is 


(23) Pr = I, [1 — P(A|#)] dPo(s) + [. P(A | 8) dPo(8). 


This is an obvious generalisation of (2). The equation expresses the fact that 
each time z falls in A and # in H or z in A and # in H, an error is committed. 
Let us use the notations 


he om 


l.u.b. of P(A | 8) for 8} in H 
l.u.b. of P(A | 8) for 3 in H 





TESTING HYPOTHESES 247 


Then the first of the two integrals in (22) cannot be greater than am and the 
second not greater than 6(1 — m). On the other hand no lower upper bound 
exists for either of these integrals, if mo is given and P(#) subjected to no other 
restriction. 

As m varies between 0 and 1, the expression 


am + B(L — mo) 


has its extreme values at the points 7» = 0 and a» = 1 and these values are a 
and 8. Accordingly the greater of the quantities a and £ is the l.u.b. of Pz 
and the success rate S equals the smaller of the two quantities 1 — a and 1 — 8. 
If P(A | #) is continuous with respect to 3, we have again 8 = 1 — a, thus a 
and 6 cannot be both smaller than $ and no S can become >}. 

If the hypothesis that 3 lies in H 1s tested by means of a region of acceptance A, 
the success rate of this test equals the smaller of the two quantities 1 — aand1 — B 
which are the minimum of P(A | #3) for 3-values in H and the minimum of P(A | #) 
for d-values outside H. The task of finding the region A with highest success rate 
is thus reduced to a simple maximum-minimum problem. 

As an example let us take the density function 


(25) ' p(x | 9) = fle — 9), 


where f(u) has a maximum at u = 0 and drops on both sides symmetrically and 
monotonically towards zero. The hypothesis to be tested may be given as 


—bsvsb. 


We find, if the interval 2x; , x. is taken for region of acceptance: 


z2 z2—0 
(26) P(A|9) = [fle - ode = [flu du. 


This function of # has its maximum at # = 4(2); + 2x2) and drops symmetrically 
both sides. If $(x; + 22) is supposed to lie in the interval (0, b) we find 


zet+b zq—b 


i-e= flu)du, B= f(u) du. 
+b z,;— 


Zz 1 


Both quantities reach the value 3, if we choose 7. = —2z, = a and take for a 
the uniquely determined solution of 


a+b a—b 
(27) [ seodu= [few du = 3. 
a+b a—b 
For this interval the success rate has its highest possible value 3. 


7. Case of n variates and k parameters. The analysis given in section 4 for 
a simple hypothesis and in 6 for a composite one extends immediately to the 
case where instead of one variate X and one parameter # a groupof n variates 
X,, X2,---, X, and a group of k parameters #3, , #2, --- , & are in question. 


FERRET AIT IAT ATT TE 





248 R. v. MISES 


The region of acceptance A is now a portion of the n-dimensional sample space, 
determined by an interval of a function F(x, 22, ---2n). The hypothesis 
to be tested will consist in assuming that the point 3), 32, --- % falls into a 
certain region H of the k-dimensional parameter space. The success rate of 
such a test is again the smaller of the numbers 1 — a and 1 — 6 where a and 8 
are defined in exactly the same way as in the preceding section. The minimum 
of P(A | #) when the 3-values fall into H is called 1 — a, and the maximum 
of the same function for all 3-combinations belonging to the complementary 
region H is £. 

If the test function F(x; , x2 , --- 2,) is known, the interval with the highest 
success rate, can be found on the same lines as in the case of one variate. In 
fact, the quantity F takes the place of z in the former analysis. If the interval 
thus found has the success rate 3, we know that no other test exists which would 
have a higher success rate as long as nothing is known about the a priori distri- 
bution in the parameter space. If a certain F(x, x2, --- ,) does not lead to 
an interval with success rate 3, one may try another test function. In the most 
general case the test function F with the highest success rate would be found 
by solving the problem of calculus of variation that consists in maximizing 
1 — aand1-— £8. Asarule such an elaborate analysis will pot be necessary. 

To ask that a test be a most powerful one is too much and too little. It is 
too much since such a test does not exist in most cases. It is too little because 
there can exist another test (on a different level of significance) with a con- 
siderably higher success rate. The correct description of a most powerful test 
is that such a test can be shown, in a simple way, to have no smaller success 
chance whatever P,(#) is than a group of other tests. If a most powerful test 
exists, it may be considered preferable to all other tests of the same success rate, 
but there is no reason why it should be considered more favorable than any test 
with higher success rate. As to unbiased tests, and other substitutes for most 
powerful tests, nothing at all can be said about their merits as compared with 
that of other tests. 

A simple example for tests with the highest possible success rate in the case of 
several dimensions is the following. Assume a density function 


(28) p(x | 8) = f(a — 3, 2 — de, +++ Pn — Bp) 


where f(u:, U2, °** Un) depends on the absolute values | w|, | w|,--- | un! 
only and decreases monotonically with increasing uj + u3 + --- uw? in all di- 
rections. The parameter point 3; = d% = --- 3, = Ocis to be tested. Let 
F(x, , 22 +++ Xn) be a function likewise depending on | 2: |, | 22|, --- | 2, | only, 
vanishing at the origin, and monotonically increasing with zi + 23 +--+ 2. 
Then the set of points for which 


(29) F(t, %2,°*+ tn) SC 





“- DD ~~ 


—, 


xs -- ww WY 


TESTING HYPOTHESES 249 


is a region of acceptance with success rate 3, if C is chosen in such a way as to 
have 


(30) / S(x1, Le, *** Ln) dx, dxz +++ dz, = 3}, 
(F<C) 


This applies e.g. to normal populations. The proof is obvious. 

8. Distinct parameter values. Tests with higher success rate than } can be 
found, if the parameter # is restricted to a set of distinct values. Take for 
instance our first example in section 3 and assume that #8 can only take the 
three values 0, +1. Then in the second expression (9) for the error chance the 
integral can not approach the value zero since’ the region H does not include 
the point ¢ = 0. The minimum value of the integral is (1 — mo) and thus 


(31) Pr S am +t (1 — a) - eos — 1). 


The success rate is the smaller of the two quantities 


l1—a and 1-a—a)[1-2@= 8) - 16. 

The best value of a is found by equating a and 8. This gives about a = B = 
0.436 and the success rate S = 0.564, for the region of acceptance x = 0 to 
z = 0.564. Other intervals or sets of intervals can be examined in the same way. 

A more impressive example is the following. We draw n = 12 times from an 
urn which contains three balls, black ones and white ones. The observed value x 
is the number of white balls drawn. The probability 3 of getting a white ball 
in one experiment can have one of the four values 0, 1/3, 2/3, 1, and we want 
to test the hypothesis 8 = 3% = 1/3. The probability distribution is given by 


(32) r(x | 8) = CZ97(1 — 9)? 


Let us choose the set of points x = 1, 2, --- 6 as region of acceptance. Then 
6 
(33) P(A|9) = > CZ 871 — 9)". 
z= 


This sum can be computed for the 4 possible 3-values: 


P(A|8)=0 0.926 0.178 0 
frv=0 1/3 2/8 1 


Thus 1 — a@ has the value 0.926 and 6 equals 0.178. The success rate is the 
smaller of the two quantities 0.926 and 0.822, thus S = 0.822. If we restrict 
the region of acceptance to the points x = 1 to 5, the values of 1 — aand1 — B 
become 0.815 and 0.934, thus the success rate S = 0.815. In the first case we 
have more than 82% chance of making a correct assertion, whatever the a priori 
probability of # may be! 


Beka wd & 22 wee Be bed ee + OF Oe be SS 


ce eke Oe Ons 8 8 wR 





250 R. v. MISES 


It is obvious that this result will become more and more strongly marked, if 
the number of observations increases. This is connected with the subject of 
the next section. 


9. Asymptotically increasing success rate. It seems strange that in the case 
of a continuously varying parameter and a distribution P(z | 8) which is con- 
tinuous with respect to # no test can have a success rate >3. One has the feeling 
that something might happen in the continuous problems similar to what was 
the case in the example of section 8. On the other hand our proof that S < }, 
in sections 4 and 6, is conclusive and it applies to problems in more than 1 di- 
mension also. The answer is that in the kind of problems where a large number 
of observations is involved a definite restrictive assumption about the over-all 
distribution Po(¥#) is silently introduced. 

The problems we have here in mind are connected with sequences of distribu- 
tions of the form 


(34) P,(x | 3) = dalx — ¥), 


where ¢;(u), ¢o(u), o3(w), --+ are cumulative distribution functions for distribu- 
tions more and more concentrated around one point, say vu = 0. In a rigorous 
form the sequence ¢,(u) can be described by the following statement: For each 
e, n > 0 exists a number N(e, 7) such that 


(35) on(n) — on(—7) 21 — forn > N(e, »). 
One wants to test the hypothesis 


under the assumption that the parameter distribution does not depend on n. In 
this case, as we shall show, one can find for each e > 0 a region of acceptance A 
such that the success rate S,, of the test corresponding to this A and to P,(z | #) 
is greater than 1 — e for sufficiently large n. 

We divide the region H, i.e. | 8 | > b, into two parts H, and A, where A, 
consists of the points | #| < b + 2n and satisfies the condition 


(36) dP) ==. 


(H,) 


Then the region of acceptance will be 


-a=-b-—yns2r485b0+7= 4, 


and the probability of z falling in this region: 
(37) P,(A | 3) = oa(b + 9 — 3) — on(—b — 1 — 8). 





TESTING HYPOTHESES 251 


As long as #8 belongs to H the right hand side in (37) is not smaller than 
n(n) — ¢n(— 7) and thus, according to (35) the error chance of first kind 


p™ = [ nll — Pa(A| 8] dPo(9) S 1 — (a(n) — 4(—n)] 
(38) 


The error chance of second kind can be written as 


(39) = I a P,(B | 8) dPo(d) + I ies P,(A | 8) dPo(d). 


The first of these integrals cannot be larger than = according to (36) 


since P,(A |) S 1. The second integral cannot exceed the maximum value 
of P,(A | 8) for din H.. But if || > b + 2 the two arguments of ¢, in (37) 
have always the same sign and are in absolute value greater than yn. It then 
follows from (35), in connection with the fact that ¢,(w) increases monotonously 


from 0 to 1, that the difference of the two ¢,-values cannot exceed . for n > 
N(e/3, n). Therefore 


(40) PS” Sets and S, = 1— PP — Pp’ >1—.e for n> v(E,n), 


This result has a wide range‘of application in the cases where a hypothesis 
is tested on the basis of a large number of independent observations. Consider 
a sequence of variates X,, X2, X3,--- subject to probability distributions 
Q:(21), Qe(ae), Qs(rs), °°: . Let x = F(x, x2, +--+ Xn) be a statistical function, 
i.e. a function depending on the distribution of its n variables only, and # the ex- 
pected value of F. Then the general law of large numbers states that the 
distribution of x has the form (34) with ¢, satisfying the inequality (35), if the 
Q,(x) fulfill certain conditions concerning mainly their behaviour at infinity’. 
The proof of this theorem which is the real source of most ‘‘asymptotical” 
properties of statistical tests was given for the first time in 1936. 
case where F is the arithmetical mean of the n variables 2; , 22 , 
known as Tchebychef’s theorem since 1867. 

Applying this general law of large numbers we can now state the following 
fact. In testing a hypothesis about the expected value 3 of any regular statistical 
function of n variates we can reach a success rate 1 — ¢€, no matter how small ¢ is, 
if the number n increases indefinitely and the initial distribution of 3 is supposed 
to be independent of n. On the other hand, no test with a success rate greater than 


} is available, if an assumption of this type is not used. 


The particular 
--» 2, has been 


*For exact conditions see ref. [3]. 


SVP eT Seas eee ~~ eee 


IVI 





252 R. v. MISES 


10. Summary. In this paper a solution of the problem of testing hypotheses 
is presented in the following sense. It is assumed that a probability distribution 
depending on some parameters is given and that nothing is known about the 
initial distribution of these parameters. For any simple or composite hypothesis 
about the parameters and any region of acceptance chosen in the sample space 
the success rate S is computed, i.e. the minimum chance for getting right answers 
out of the test. From the formulae given for S a test with highest success rate 
can easily be found in each case. 

This theory shares the point of departure with the actually used theory which 
leads to the concept of most powerful tests. A most powerful test is described 
as a test which, by simple reasoning, can be seen to have no smaller success 
chance than any other test on the same “level of significance” a. In the rare 
cases where most powerful tests exist for all a-values, one of them, with an 
a-value singled out by our theory, has the highest success rate and then is pref- 
erable to all other tests which might have the same success rate. In all other 
cases our method supplies a test of highest success rate in no relation to “un- 
biased” tests or other current substitutes for most powerful tests. 

Some of the main results are: No test has a success rate >}, if nothing is 
known about the parameters except the limits of their values and if the 
given distribution is a continuous function of the parameters. The success rate 
can be higher, if the parameters are restricted to certain distinct values. A 
success rate no matter how close to 1 can be reached in a sequence of tests based 
on an increasing number n of observations, if the initial distribution of the 
parameters is known to be independent of n. 


REFERENCES 


[1] Rospert W. B. Jackson, ‘‘Tests of statistical hypotheses in the case when the set of 

alternatives is discontinuous, illustrated on some genetical problems.’’ Stat. 
~ Res. Mem., Vol. 1 (1936), p. 138-161. 

[2] R. v. Miss, ‘“‘On the correct use of Bayes’ formula,’’ Annals of Math. Stat., Vol. 13 
(1942), p. 156-165. 

[3] R. v. Misgs, ‘“‘Die Gesetze der grossen Zahl fiir statistische Funktionen,’’ Monatsh. 
Mathem. u. Physik, Vol. 43 (1936), p. 105-128. 

[4] J. Neyman, “‘Sur la vérification des hypothéses statistiques composées,’’ Bull. Soc. 
Math. de France, Vol. 63 (1935), p. 246-266. 

[5] J. Neyman, ‘‘Outline of a theory of statistical estimation based on the classical theory 
of probability,’’ Phil. Trans., Ser. A, Vol. 236 (1937), p. 333-380. 

[6] A. Waxp, ‘‘Contributions to the theory of statistical estimation and testing hypoth- 
eses,’’ Annals of Math. Stat., Vol. 10 (1939), p. 299-326. 

[7] A. Waxp, ‘‘On the principles of statistical inference,’’ 1942, Notre Dame Lect. No. 1. 





ON THE RELIABILITY OF THE CLASSICAL CHI-SQUARE TEST 


By E. J. GumMBe. 
New School for Social Research 


For a given set of observations and for a continuous variate, different classi- 
fications lead to different observed distributions and to different values of x’. 
This shortcoming has been vaguely felt by statisticians. We shall explain how 
these differences arise and show that they are important enough to cast a great 
deal of doubt on the validity of the application of the usual x’ method to a con- 


tinuous variate. Finally, we propose a procedure which is free from these 
difficulties. 


1. The observed distributions. The x? method gives a numerical measure of 
the differences between the observed and the theoretical distribution. A theo- 
retical distribution is completely determined once the constants are known. 
For a discontinuous variate the observed distribution is also well defined; but 
for a continuous variate the concept ‘observed distribution” is vague. To 
classify N observations, 21 , 22, -** 2m, °** Zy arranged in increasing order, we 
introduce two arbitrary actions: the choice of the intervals and the beginning 
of the first cell. As a rule, all cells have the same length, and they are bounded 
by integral numbers, or even numbers, or round numbers, 0, 5, 10, of the variate. 
But these classifications and the preference given to round numbers for the start- 
ing point have no theoretical foundation. 

A certain guide for the systematic choice of the class length and the beginning 
of the first cell may be found by turning to the theory. Many theoretical dis- 
tributions of a continuous variate z have only two constants, and permit the 
introduction of a reduced variate y with the dimension zero, where 


t—a 
(1) ,o 


The constant a is a mean, and b is a measure of dispersion. The probabilities 
W(x) (or F(y)) for values equal to or less than z (or y) are 


(2) W(x) = Fly). 


For most distributions, for which the above transformation is possible, tables 
for F(y) exist, in which the argument progresses by a fixed interval Ay. By 
taking an initial value y and a fixed interval Ay, the differences 

(3) NF (yo + tAy) — NF(yo + (@ — 1)Ay) = Npi @ = 1,2, --- k) 


may be interpreted as being the theoretical distribution. The corresponding 
values of the variate, by (1), are 


(4) 2) =a+bytidy); xi — 1) =a + by + @ — 1)dy) 
253 


auea ws 22 wet se es See Ft ee ee 


ws 


ae Se SAS aes Bares S 





254 E. J. GUMBEL 





and the cell length is 
(5) A(x) = bAy. 


In (3) kis the number of cells. In general, x(z) and x(t — 1) will not exist among 
the observed values z,,. By arranging the observations in the cells given by 
the theoretical values (4), we obtain an observed distribution consisting of the 
contents a; of the cell7. This procedure prescribes a classification of the observa- 
tions according to the theory. The intervals selected are multiples of some 
measure of dispersion. ‘In principle, the choice of Ay and of the starting point 
Yo remain arbitrary; in practice, the selection of Ay is limited by the intervals 
given in the probability tables. 

This natural classification may be used for constructing different observed 
distributions from the same set of observations. We determine the constants, 
then choose a small interval and a starting point which is below the smallest 
observation x,. The last cell is such that it contains the largest observation zy, . 
In this way, we obtain the initial observed distribution, consisting of k cells. 

If we combine h cells (h = 2, 3, --- 4k), we obtain h different observed dis- 
tributions: We combine h — 1 void cells with the first cell of the initial distribu- 
tion, we combine the second cell and the following h — 1 cells of the initial dis- 
tribution, and soon. Generally, we combine gq void cells (¢q = h — 1,h — 2, -:: 
1, 0) with the first h — q cells of the initial distribution, then the next h cells of 
the initial distribution, and so on. The last of these h distributions starts with 
the first h cells of the initial distribution. 

If we combine more and more cells, the number of observed distributions, 
having the same intervals, increases. The larger the intervals the larger is the 
influence of the starting point, and the more the observed distributions become 
dissimilar. To see this influence of classification on the shape of the observed 
distributions, consider the extreme case for a symmetrical theoretical distribu- 
tion of an unlimited variate. Let the observed distribution consist of two cells. 
Assume besides that the observed median is close to the theoretical one. If the 
cut between the cells is identical with the theoretical median, the two cells have 
the contents 3N + «and 3N — e, where eis small. If the cut is shifted suff- 
ciently far to the left or right of the median, the cell contents will be 0, N and 
N, 0. These two distributions are completely different. 

To each observed distribution corresponds a theoretical one obtained from 
(3) by the same combination of cells as the observed distribution. In the graphi- 
cal representation, the same continuous theoretical distribution may be used for 
all observed distributions by choosing the scale of the ordinate properly. The 
length chosen for representing one observation in the initial distribution will 
represent h observations for the h distributions obtained by the combination 
of h cells. 

The different observed distributions corresponding to the same observations 
and to the same theory will give different values of 


6) v= pes. 

























nong 
n by 
f the 
TVA 
some 
int 
‘Vals 


rved 
ants, 
llest 
In. 
ls, 
dis- 
"ibu- 
dis- 
ls of 
with 
ons, 
} the 
ome 
rved 
‘ibu- 
‘ells. 
the 
lave 
uffi- 
and 


rom 
phi- 
1 for 
The 
will 
tion 


ions 


CLASSICAL CHI-SQUARE TEST 


The expected contents of the first and last cell are 


(7) Np: = NF(yo + Ay), 
(8) Np. = N(1 — F(yo + (k — 1)dy)). 
Since the total expected frequency must be equal to the number of observations | 
: 
k k : 


formula (6) may be written 





k 2 
2 a; 
= —N. 
(10) x £ No 


This formula, being simpler than (6), will be used in the numerical example. 
An upper limit for x’ is furnished by the case that one cell j contains all ob- 
servations. Then 





a,;=N; a;=0 for i #¥4j, 


whence from (10) 





tae. 
x = i 
The upper limit depends again upon the intervals and the starting point of the 
classification. If the probability for an observation to be contained in the cell 
jis small, the upper limit is large. 

The exact distribution of x’ has not yet been established. To obtain an ap- 
proximation, it is assumed that a binominal distribution may be replaced by a 
normal distribution. As this does not hold for cells with a small expected fre- 
quency, the contents of such cells must be combined. This prescription, which 
is also valid for a discontinuous variate, constitutes a third arbitrary action in 
the calculation of x’. It invalidates the prior postulate that all cells ought to 
have the same length. 

The approximation used for the probability P of obtaining a value of x’, equal 
to or larger than the observed one, is 


(11) 0 


IA 
























*e : cl Rantala LL LALLA 
ee Se eee ee eee ee ; . . 


(12) P(x’, ») = K | BHM te ae? 
x 


where v is the number of degrees of freedom. Since 


oP - @P 
ae ~<% os 












(13) 


P diminishes as x’ increases, v being given, but P increases as v increases, x’ 
being given. By choosing larger cells, the number » diminishes, and P may 
remain the same if x’ diminishes adequately. 

It is easy to see that x’ cannot increase as a result of the combination of cells 


256 E. Je GUMBEL 


and will, in general, decrease. Let a; and a2 represent the actual number of 
observations in two cells that are to be combined. Let Np; and Ny be the ex. 
pected numbers. Then, the contribution of the two separate cells to x” minus 
the contribution of the two combined cells is, by (10) 


GQ, at Qma+a 
Noni Np2 N (pi + D2) 


As a; and a; are positive or zero, the difference is proportional to 
aips + aspi — 2aa2pyp2 = (arp2 — api)” = O. 


The equality holds only when a,;:a2 = pi:p2. Then, the combination of cells has 
no influence on x’, but it reduces the number of degrees of freedom by one, and 
diminishes the probability P. In the general case, the combination of cells 
diminishes x’ and diminishes v at the same time. According to (13), the first 
influence tends to increase the probability P, the second to diminish it. It 
cannot be stated a priori which influence is stronger. 

For a given set of observations, a continuous variate and a given theory, which 
includes given estimates of the constants, the probability P depends upon three 
arbitrary actions. If a certain choice of the intervals gives a good fit, it cannot 
be concluded that a broader classification gives the same or a better fit [4]. For 
a given interval, P may vary considerably with the starting point. This influ- 
ence cannot be allowed for by any formula as the number of degrees of freedom 
does not depend uyon the starting point. Finally, the term “small expected 
numbers” is vague. Different combinations of cells lead to different probabili- 
ties. It is generally assumed that these influences remain within reasonable 
limits and that P does not vary considerably if we change the class length or the 
starting point. In the following example, we shall show that this opinion is 
erroneous. 


2. Numerical example. The flood discharge of the Mississippi River at Vicks- 
burg for each of the fifty years 1890-1939 will be used to illustrate the extent to 
which the observed distributions and P vary with the choice of cell length and 
the starting point. The observed flood discharges z,, measured in 1,000 cubic 
feet per second are given in Table VI of a previous article [2], and are not re- 
peated here. The expected distribution is given by the theory of largest values 
which states that the probability Y(x) of a flood discharge equal to or less than 
zis 


(14) W(x)-= oe” 


Values of W(x) as a function of the reduced variate 


(15) y = a(x — u), 


are given in Table II of the reference first cited. 





CLASSICAL CHI-SQUARE TEST 257 


Calculation of the constants a and u leads to the theoretical value of the flood 
discharge 
(16) xz = 1201.9 + 266.ly 
associated with a given probability F(y) = W(z). 


TABLE I 
Observed and theoretical distribution (1) for the interval Ay = .25; Ax = 66.626 


Variates | Distributions 


Absolute Observed Theoretical 
= a; Np; 


2 


736.2 
802.8 
869.3 
935.8 
1002.3 
1068 .9 
1135.4 
1201.9 
1268 .4 
1334.9 
1401.5 
1468 .0 
1534.6 
1601.1 
1667 .6 
1734.1 
1800.6 
1867 .2 
1933 .7 
2000 . 2 
2066 .7 
2133.3 
2199.8 
2266 .3 
2332.8 


+ 


5655 
-959 
1.775 
2.720 
3.5955 
4.2315 
4.5475 
4.554 
4.314 
3.914 
3.434 
2.934 
2.4565 
2.0235 
1.647 
1.3270 
1.0615 
.844 
.668 
.527 
.414 
.325 
255 
1995 
708 





KOOOONONNONONKPDAWWWHRK AWWR | w 


50 
.25 
.00 
75 
.50 
25 
.00 
25 
50 
75 
00 
25 
50 
75 
.00 
25 
50 
75 
00 
25 
50 
75 
.00 
25 


lV 


50 50.000 


The first observed distribution presented in Table I is obtained by letting 
Ay = .25; Ar = 66.525 and yo = —1.75. The expected number of observations 
for the first and last cell are 50F(—1.5) and 50 (1 — F (4.25)) respectively. 





258 E. J. GUMBEL 


The expected frequencies (formula 4) for the other cells 
np; = 50 [F(y + .25) — F(y)], 


were obtained by successive substraction of two consecutive figures given in 
column 2, Table II [2]. The theoretical and the observed distribution are 
plotted in figure 1. The observed distribution given in Table I is very irregular, 

Evidently, the intervals are too small. Therefore, we construct the observed 
and theoretical distributions (2) and (3) for cells which are two times larger. 


REDUCED VARIABLE Y 


INTERVAL Ax = 66.525 


OBSERVED DISTRIBUTION 7. 


o 
8 
& 
a 
LJ 
rs 
2 
2 
Z 


The first cell in distribution (2) is obtained from distribution (1) by combining 
the first cell of (1) with the empty one before it; the second cell is obtained by 
combining the second and third cells of (1); and so on. 

Distribution No. 3 is obtained by combining the first two cells of distribution 
No. 1, then the third and fourth, and soon. The observed distributions 2 and 3 
and the theoretical distribution are plotted in figure 2. The scale of the ordinate 
is 4 of the scale in figure 1. In the same way, the three observed distributions 
(4), (5), (6) for the interval Ay = 3, Ax = 199.57 are obtained by combining 
either two void cells with the first cell of Table I, or one void cell with the first 
and second cell of Table III, or the first three cells of Table I (see fig. 3). 

Finally, the four observed distributions (7), (8), (9), (10) for the interval 





NUMBER OF YEARS 


NUMBER OF YEARS 


SFLOOD DISCHARGES xX 
IN 1000 CFS. 


Fia. 2 


REDUCED VARIABLE Y 


INTERVAL Ax = 199.575 
| 
4 
| | OBSERVED DISTRIBUTIONt 
| 


----OBSERVED DISTRIBUTION 6: 


‘-—-OBSERVED DISTRIBUTION «5° 


FLOOD DiscHARGES >< 
IN 1000 CFS 
8a 135 oO 
Fig. 3 
259 





INTERVAL Ax = 266.1 


OBSERVED DISTRIBUTIONZ 


NUMBER OF YEARS 


' 
FLOOD DiscHARGES *% X__! 
IN 1000 CFS. 


INTERVAL Ax = 266.1 


—— OBSERVED DISTRIBUTION 9 


3 
> 
} 
d 
uw) 
a 
2 
3 
Z 


OOD DISCHARGES >< 
IN 1000 CFs. 





CLASSICAL CHI-SQUARE TEST 261 


Ay = 1; Az = 266.1 are compared with the theoretical distribution in figures 
4and 5. The four distributions 7-10 differ considerably. Distributions & and 
9 indicate that the agreement between theory and observations is good, dis- 
tribution 7 and 10 indicate that the fit is bad. The x’ method must give the 
same contradictory results. 


TABLE II 
Four values of P(x) for the same observations and the same theory 


2|3|4|5| 6 7 | s | 9 | 10 


; Observed Distri- | Theoret- 
iam butions, . Guten Components of x? + N 
| (7) | (10) | (9) | (8)| Nps 


——_—— 


3.2995 | 
6.0195 10.632 
9.6150 

13.8465 14.155 

15.0945 9.540 

16.9285 8.506 

17.6470 
17.3295 

16.2160 
14.5960 

12.7385 

10.8480 
9.0610 
7.4540 
6.0590 
4.8795 
6.3290 
5.0020 
2.9405 6.344 

3.0965 2.907 


| 7.577 

















+N = 57.664 55 .813/51 . 902/50 . 698 
| .023 .057 | .399 | .705 


The details for the calculations of x’ are given in Table II. The numbers of 
column 1 are the midpoints of the cells. To save space, the four theoretical dis- 
tributions obtained from Table I, col. 4 are written in the same column (6) 
directly opposite the corresponding observed distributions given in columns 
2to 5. Through formula (10) we calculate the components of x° + N (cols. 7 
to10). Although the four distributions differ only with respect to the beginning 











262 E. J. GUMBEL 

of the first cell, the value of P for the observed distribution number (8) is more 
than thirty times the value of P for the observed distribution number (7). In 
view of the fact that these values of P are calculated for a fixed set of observa- 
tions, for the same theory, the same constants, and the same number of degrees 
of freedom, the differences found are surprising. 


3. The probability integral transformation. This example shows that the 
probability P may vary with the starting point in such a way that no conclusion 
about the acceptance or rejection of a hypothesis can be obtained from the usual 
x’ method. The three arbitrary steps described above may be avoided if we 
choose cells of equal probability instead of cells of equal length. The required 
intervals are obtained from the probability integral transformation, due to Karl 
Pearson [6]. Let w(x) be a distribution of a continuous variate z, let y = W(z) 
be the transformed variate, then the distribution p(y) of the variate y is 


(17) . p(y) = 1. 


In other words: The probabilities W(x) are uniformly distributed. If a distribu- 
tion w(x) has been chosen for a given set of observations z,, , we can control this 
theory by investigating whether the “observations” W(z,,), i.e., the theoretical 
cumulative frequencies of the observed values are uniformly distributed. Thus, 
the comparison of the observed distributions with any continuous theoretical 
distribution is reduced to the comparison of an “‘observed’”’ with a theoretical 
uniform distribution. To a given set of observations and a given theory there 
is one, and only one, “observed’’ distribution. If we introduce within w(z) 
another set of constants, or choose instead of w(x) another theory ¢(zx), we ob- 
tain, of course, other “‘observed’’ values [1]. 

The goodness of fit between this theory and these “‘observations’”’ may be 
measured by the x* method. We divide the interval zero to N, which contains 
the N “observed” numbers NW (zm) into k cells of equal length, and enumerate 
the “‘observed” points NW(z,,) contained in each cell. The starting point of the 
classification is always zero. The expected number of observations for each cell 
is always N/k. If we choose k sufficiently small, the necessity for combining 
cells is eliminated. We have to choose k in such a way that the conditions, 
under which formula (12) holds, are fulfilled. The question of the best choice 
for the number of cells has been studied by Wald and Mann [3]. Their solution 
is valid for small levels of significance and for large numbers of observations. 


4. Conclusion. The usual x? test is unreliable for a continuous variate as it 
involves three arbitrary decisions. From the same observations, the same 
theory, and the same constants different statisticians, equally well trained and 
equally careful, may obtain different probabilities P, and may proclaim any one 
of these results as final. Therefore, the usual x’ method does not lead to a de- 
cision whether a hypothesis has to be rejected or not. Such a decision is possible 
if we use the probability integral transformation. Unfortunately, the question 








is Foo et SCO —_ Fre PPS BS eESElllm 


Vo am (O DO MD WO 


- © 


CLASSICAL CHI-SQUARE TEST 263 


of the best choice of the cells for small numbers of observations and large levels 
of significance is not yet solved. 


REFERENCES 


{1] E. J. GUMBEL, “Simple tests for given hypotheses,” Biometrika, Vol. 32, Parts 3 and 4 
(1942), pp. 317-333. 

{2} E. J. Gumpet, ‘The return period of flood flows,’”’ Annals of Math. Stat., Vol. 12 (1941), 
pp. 163-190. 

[3] H. B. Mann and A. Wa_p, “On the choice of the number of class intervals in the applica- 
tion of the chi square test,’’ Annals of Math. Stat., Vol. 13 (1942), p. 306-317. 

[4] J. Nevman and E. S. Pearson, ‘Further notes on the x? distribution,” Biometrika, 
Vol. 22, Parts 3 and 4 (1931), pp. 298-305. 

{5) Kant Pearson, ‘On a method of determining whether a sample of size n supposed to 
have been drawn from a parent population having a known probability integral 
has probably been drawn at random,’’ Biometrika, Vol. 25, Parts III and IV 
(1933), pp. 379-410. 





A SAMPLING INSPECTION PLAN FOR CONTINUOUS PRODUCTION’ 


By H. F. DopcGe 
Bell Telephone Laboratories, New York 


I. INTRODUCTION 


1. Purpose. This paper presents a plan of sampling inspection for a product 
consisting of individual units (parts, subassemblies, finished articles, etc.) manu- 
factured in quantity by an essentially continuous process. 

The plan, applicable only to characteristics subject to nondestructive inspec- 
tion on a Go-NoGo basis, is intended primarily for use in process inspection of 
parts or final inspection of finished articles within a manufacturing plant, where 
it is desired to have assurance that the percentage of defective units in accepted 
product will be held down to some prescribed low figure. It differs from others 
which have been published’ in that it presumes a continuous flow of consecutive 
articles or consecutive lots of articles offered to the inspector for acceptance in the 
order of their production. It is accordingly of particular interest for products 
manufactured by conveyor or other straight line continuous processes. 

In operation, the plan provides a corrective inspection, serving as a partial 
screen for defective’ units. Normally, a chosen percentage or fraction f of the 
units are inspected, but when a defective unit is disclosed by the inspection it is 
required that an additional number of units be inspected, the additional number 
depending on how many more defective units are found. The result of such in-: 
spections ‘is to remove some of the defective units, and the poorer the quality 
submitted to the inspector, as measured in terms of per cent defective, the greater 
will be the corrective or screening effect. The object of the plan is the same as 
that incorporated in some of the sampling tables already published’, namely, 
to establish a limiting value of “average outgoing quality” expressed in per cent 


1 Presented at the Joint Meeting of the American Society of Mechanical Engineers and 
the Institute of Mathematical Statistics, May 29, 1943, by H. F. Dodge, Quality Results 
Engineer, Bell Telephone Laboratories, New York. 

2H. F. Dodge and H. G. Romig, ‘‘Single Sampling and Double Sampling Inspection 
Tables’’, Bell Sys. Tech. Jour., Vol. XX (1941) pp. 1-61. An unpublished paper by Prof. 
Walter Bartky (developed when he was associated with the Western Electric Co., 1927) 
provides a continuous multiple sampling plan involving two factors—/, as used here, andi, 
the number of units in a ‘‘compensating sample’’ required to be inspected for each defective 
unit found. 

3 Lt. R. J. Saunders, ‘‘Standardized Inspection’, Army Ordnance, Vol. XXIV (1943) pp. 
290-292; G. Rupert Gause, ‘‘Quality Through Inspection”, Army Ordnance, Vol. XXIV 
(1943) pp. 117-120. 

4A unit of product that fails to meet the requirement for a characteristic is classed as 
nonconforming with respect to that characteristic, and for convenience is referred to as 
‘defective’. Thus, a deviation from a specified requirement or from accepted standards 
of good workmanship is termed a “‘defect’’. 

5H. F. Dodge and H. G. Romig, loc. cit. 


264 





SAMPLING INSPECTION PLAN 265 


defective which wiil not be exceeded no matter what quality is submitted to the 
inspector. This limiting value of per cent defective is termed the “average 
outgoing quality limit (AOQL)”. 

The theoretical solution treats the case of inspecting a continuous flow of 
individual units and is based on the distribution of random-order spacing of 
defective units in product whose quality is statistically controlled.® Part III of 
the paper extends the application of the method to a continuous flow of individual 
lots or sub-lots of articles. 


II. INSPECTION OF A FLow oF INpDIvipuAL UNITs 
2. Inspection of one characteristic. Consider first the inspection of a flow of 
individual units, offered consecutively in the order of their production. As- 
sume that inspection is to be made for only one quality characteristic, so that 
interest will be centered on one kind of defect. Subsequently (Section 13), 
consideration will be given to the procedures when inspection is made simul- 
taneously for several kinds of defects. 


3. Procedure A. The procedure is as follows: 

(a) At the outset, inspect 100% of the units consecutively as produced and 
continue such inspection until 7 units in succession are found clear of 
defects. 

(b) When 7 units in succession are found clear of defects, discontinue 100% 
inspection, and inspect only a fraction f of the units, selecting individual 
sample units one at a time from the flow of product, in such a manneras 
to assure an unbiased sample. 

(c) If a sample unit is found defective, revert immediately to a 100% inspec- 
tion of succeeding units and continue until again 7 units in succession are 
found clear of defects, as in paragraph (a). 

(d) Correct or replace with good units, all defective units found. 


4. Protection provided by the plan. The inspection plan is defined by the 
two constants, f and i, which can be altered at will. For given values of f, 7, and 
p (incoming fraction defective), there will result for product of statistically con- 
trolled quality a definite average outgoing fraction defective (average outgoing 
quality, AOQ). For given values of f and 7, the AOQ will have a maximum for 
some particular fraction defective p, of incoming quality. As noted above, this 
maximum is referred to as the average outgoing quality limit (AOQL). For all 
other values of incoming fraction defective p greater or less than p, , the AOQ 
will be less than AOQL. Many combinations of f and i will result in the same 
AOQL. 

The protection offered by the plan discussed here can thus be expressed in 
terms of the AOQL, in per cent defective. 

§ “Statistical control’’ as defined in the literature; see W. A. Shewhart, Statistical Method 
Jrom the Viewpoint of Quality Control, The Graduate School, U.S. Dept. of Agriculture, 1939. 





266 H. F. DODGE 


5. Theoretical framework. We are concerned with the spacing between 
defective units when the individual units are arrayed in the order of their pro- 
duction, as shown in Fig. 1. If the manufacturing process is statistically con- 
trolled so that the probability of producing a defective unit is constant and equal 
to p, then defective units will have an order spacing of a random character which 
is expressible in terms of certain probability laws. Product turned out by such 
a process will be referred to as having a process average fraction defective p, 
The “event” of particular interest is a “terminal-defect sequence” of 7 + 1 suc- 
cessive units following the observance of a defect, comprising a succession of i 
nondefective units followed by a defective unit, as shown in Fig. 1. The totality 
of all possible such sequences, where 7 varies from 0 to ~, constitutes the uni- 
verse of events under consideration. 

Each such sequence of 7 + 1 units, comprising 7 successive. nondefective units 
followed by a defective one, has a definite probability of occurrence, for a process 
average fraction defective, p. The complete set of such probabilities for all 
possible sequences, having respectively 7 = 0,1, 2,3, --- ©, defines a probability 
distribution’ of random-order spacing of defects in uniform product. This is 


— UNIT TERMINAL OEFECT 
‘ein UNIT SEQUENCE 
©0x0000x000000000 xI00000000 xX!I0 
ee 


ORDER OF PRODUCTION 


Fic. 1. Spacing of defective units 


shown in the table below in which 0 represents a nondefective unit, X represents 
a defective one, p is the fraction defective, and g = 1 — p. 


Defect No. of Non- 

Spacing defective Proba- No. of 

(No. of Unitsbefore bilityof Termin 

unitsin Findingthe  Occur- the Power 
Sequence sequence) Next Defect rence Series 
xX  ?p 1st 
Ox pq 2nd 
00X py 3rd 
000X og 4th 
0000.X pq 5th 


pg Ss (it + 1) st 


7 Romanovsky, V., ‘‘Due Nuovi Critéri di Controllo Sull ‘andamento Casuale di Una 
Successione di Valori’’, Giornale dell ‘Instituto Italiano degli Attuari (1932) discusses this 





SAMPLING INSPECTION PLAN 


These probabilities are the successive terms in the infinite power series 
p+ pq + pq + pg t+--- 
(1) or pl+q+q+¢+-::). 


The sum of this series is p i = 1, i.e., the total probability for all possible 
sequences is unity (as it should be). 

The sum of the first 7 + 1 terms of the series is the probability of occurrence 
of a “terminal-defect sequence” (defect spacing) of 7 + 1 units or less. The sum 
of the first 7 terms is the probability, P; , of failing to find the next 7 units clear 
of defects, which is 


j=i-l 


(2) P= 2 pf=i-—g¢. 
In turn, the sum of all terms beyond the ith term is the probability of finding 0 
defects in the next 7 units, which is 


(3) Q=1-Rh=¢. 


These results and the power series (1) enter into subsequent portions of the 
discussion. The curves of Fig. 2 give values‘of 1 — q’. 

6. Average outgoing quality. Suppose a plan is selected, choosing specific 
values of f and 7. 

For given values of 7 and p, there will be an expected average number of 
units, u, inspected following the finding of a defect. Likewise, for given values 
of f and p there will be an expected average number of units, v, that will be 
passed under the sampling procedure before a defect is found. The latter 
average number includes the sampling units actually inspected as well as the 
uninspected units produced between successive sample units. 

The average fraction of the total product units inspected in the long run is 


_utfoe 
(4) os 


It is now assumed for purposes of solution that the inspection operation itself 
never overlooks a defect and that all defective units found during the inspection 
of f and 7 will be corrected or replaced by good units.* 


probability distribution of spacing of events, referring to the spacing as the ‘‘length of a 
partial series’’. Our term ‘‘terminal-defect sequence’’ has the same significance as his term 
“partial series’. See also P. S. Olmstead, ‘‘ Note on theoretical and observed distributions 
of repetitive occurrences’’, Annals of Math. Stat. Vol. XI (1940) pp. 363-366; A. M. Mood, 
“The distribution theory of runs’’, Annals of Math. Stat., Vol. XI (1940) pp. 367-392. 

* The assumption that the inspection operation is perfect cannot be made without reser- 
vation. Machine inspection devices have their margins of error. Also, inspection fatigue 
prevents 100% manual and visual inspections from insuring perfection, particularly if such 
inspections continue over a considerable period of time. But the efficiency of the latter 





268 H. F. DODGE 


As a result of the screening effect of the inspection, the average outgoing 
quality, AOQ, designated p, , is related as follows to the incoming quality p; 


(5) pa = p(l — F) = (1 - fF) 


7. Determination of u. The average number of units, u, inspected on a 100% 
inspection basis fo]'ewing the finding of a defect is a function of 7 and p, and 
may be determined from a considration of two power series, one limited and 
the other infinite. 

Once the 100% inspection starts, there are several things that can happen 
before 7 units are found clear of defects. The first 7 may be found clear; or 1, 
2, 3, or more defects may be found before finally a run of 7 units is found clear, 

One of the quantities to be determined is the average number of units inspected 
in a “failure sequence,” that is, one terminating in a defect and comprising i 
or less units. This average number, designated as h, is the average of the 
distribution made up of the first 7 terms of the power series (1). The average is 


(6) ha Po + 29+ 8 + Ag + +a), 


where the denominator is the sum of the probabilities for the first 7 terms. This 
may be evaluated as follows: 


p_@ 
7 1 — q' dg 


ae 
1—gqgidqL1-—q 


1 i ? 
(7) - pd — q) [1 — q'(1 + pi)]. 

Note that if pi is small compared with unity, h is approximately 1/p. 

The next step is to determine the average number of failure sequences that will 
be encountered before finding 7 units clear of defeets. This average number, 
designated as G, may be found from the probability distribution of all possible 
numbers of failure sequences, expressed by the infinite series 


(8) Ql + Pi + Pi + Pi +---) 


where P; is given by equation (2), Q: = 1 — P,, as given by equation (3), and 
the successive terms are the probabilities of occurrence of 0, 1, 2, 3, etc. failure 


Il+qt+¢+¢+-:-+9¢) 


inspections is generally higher when an interest incentive is provided as is usually the case 
in sampling inspection plans where the extent of such inspections hinges on their findings. 

The solution given assumes correction or replacement of defective units. Where it is 
expedient to reject such units and not replace them, equations (19) to (22) inclusive, should 
be modified by replacing 7 by i — 1. 





SAMPLING INSPECTION PLAN 


WO CUUANTIINNITH 


i+l, ie meant the 
number of consecutive units produced 


@ = Defective Unit 


itl = 11 


itl = 6 
tive Unite 


from one defective unit to the next 


0 elo 00000/0000000000 00 


Note: By defect spacing, 


2 
a 
” 
lo 
Vo 
wi 
re 
wW 
a 
< 
* 
© 


@ i, NUMBER OF UNITS 


Fic. 2. Curves defining distribution of random order spacing of defects in uniform product 


JD- 4 ‘Si93430 4O YV319 SLINA | 1X3N 3HL ONIJ OL ONITIVS JO ALITIGWEONd @ 
(d-+=b): wa b ‘SS37 YO F+1 4O ONIDWdS 193430 V JO JONINYNIIO JO ALITIGVEONd ® 





270 H. F. DODGE 


sequences before finding 7 units clear of defects. The average number of failure 
sequences, G, is given by the sum of the infinite series 


G = Q,(0 + 1P; + 2Pi + 3P} + ---) 
(9) = Q,P,(1 + 2P,; + 3Pi + 4P? + ---). 
Summing the series, we have 


1 P 1 1 -— q 
10 G = 2.721}; = = ——. 
-_ a ‘(1- Py Q q° 

Now 4, the average number of pieces inspected following the finding of a 
defect, is made up of a number of failure sequences followed by a run of 7 units 
clear of defects. Using the average values of G and h just found, we have 


(11) e~wth+sn tk. 


t 


8. Determination of v. The average number of units, v, that will be passed | 


in a period of sampling inspection will be 1/f times the average number of in- 
dividual sample units inspected in such periods. Here again the solution will 
depend on the random order spacing of defects in uniform product. Whether the 
individual units selected during the sampling inspection procedure are selected 
by a random spacing device, or by any other means which will prevent known 
bias in the sample, we may assume that defects will be found to occur in ac- 
cordance with the distribution of random order spacing defined by the terms of 
the series given in (1). The average number of sample units inspected in a 
period of sampling inspection will thus be the average defect spacing for product 
having fraction defective, p, which is given by the infinite series. 


(12) H = p(1 + 2¢ + 3¢° + 4q° + --:). 


Summing the series, we have 


ee 
= “" gS gq)? pp’ 


and the value of v is found to be 


(14) 


9. Determination of f and 7 for a given value of AOQL. From the considera- 
tions given above, the average fraction of the product inspected, F, and the 
value of average outgoing quality, p. , can be determined for any given vulues 
of p,f, andi. Substituting in (5), the values of u and v given in (11) and (14), 
we have 


_ _ f 
(15) ii p| f+a-fna- =: 









ilure 








of a 
Inits 















\dera- 
d the 
Talues 

(14), 


















SAMPLING INSPECTION PLAN 271 


The average outgoing quality limit, AOQL, (p,) is the maximum value of p, 
that will result for any given values of f and 7, considering all possible values of 
pin the submitted product. The value of p for which this maximum value of 
p, occurs is designated by p; , hence 


(16) Pu = pf - aac 


The value of p; for which pa = pz is determined by differentiating (15) with 
respect to p, equating to 0, and solving for p, that is 


dp, _ fF +f -NG — p)' + pill - Na — vp) 
- f+a—Na— pF ; 


Simplifying, and using the designation p, for the maximizing value of p, gives 


(17) 


@+ Dp -1= S40 — py, ot 


_fl@+)n- U 
(l—f)(l — mp)” 
Substituting in (16) this value of (1 — p,)", we have 


(18) (1 — pi)’ 





(19) Pi = Stan! , hence 
a 1+ tpr 
(20) me TF8 
From (18) and (19), we have 
(21) ~« “42 (1 — p)*, hence 
ee t+1 


tp, + (1 — pa) 


The curves given in Fig. 3 were calculated by choosing values of i for given 
values of AOQL (pz) and calculating p; from equation (20) and f from equation 
(22). Thus.for a given AOQL value, an 7 value may be found for a chosen f 
value and vice versa. It will be noted that for a given value of f, 7 varies in- 
versely with the AOQL value, to a close degree of approximation. 


10. Operating characteristics of the plan. Figs. 4(a) and 4(b) give a picture 
of the operating characteristics of the general plan as f and 1 are varied. They 
indicate for example that for a moderate range of f values the factor 7 has a 
stronger influence than f in determining the discrimination that. the method 
affords between high and low levels of incoming per cent defective. For the 


values of f and 7 shown, Fig. 4(b) indicates just what level of incoming per cent 


Se nt. Pr eee 


TOOYV JO on[BA UdAIZ B IOJ 2 puB f Jo SoN[VA Zurulussjep 10J saAIng ‘eg ‘DIY 


SLINA 40 we ‘I 


aaa 
ry oN 


8 


" a ae jo \g elo 
EHS apo EE 


H. F. DODGE 


a 
Ae 
NG 
os 


SERSNS,. 
ease es is fe ANS 


A eee 
et 


F OZTe OTdueSs B IOJ OT*O ST 
fz ‘eoueqdeooe Jo A4tTIQe 
H -qord 943 WoTqM Jol sytun 
E qonpoid OOOT = N jo uns 
ty & EH @AT{noestOD B UT ‘eAtz00e3 
H 8p ue Jed Jo enTwa ou} = gu ut + Mas 





SAMPLING INSPECTION PLAN 273 


defective would force a correction of the manufacturing process, if the percentage 
of total production that would be accepted on a sampling basis falls below a 
critical value—often, a value of the order of 80% to 90%. 

Fig. 5 gives a comparison of the characteristics of several plans having the 
same AOQL value, 1%. It indiates for example that when the normal level 
of incoming per cent defective is well below the AOQL, the AOQL value can be 
assured with less inspection by choosing f small and 7 large. But since, for a 
given AOQL value, the average amount of inspection approaches a minimum 
as f approaches 0, factors other than the minimum amount of inspection have a 


(a) PER CENT OF PRODUCT UNITS Cb) PER CENT OF TOTAL PROOUC TION 
ACCEPTED WITHOUT INSPECTION ACCEPTED ON A SAMPLING BASIS 


PER CENT 


VALUES 
= 200 


AOQL. 
fron i 


14 16 
INCOMING PER CENT DEFECTIVE 


Fic. 4. Curves showing effect of f and i on operating characteristics of plan 


PER CENT OF PRODUCT UNITS 
ACCEPTED WITHOUT INSPECTION 


be 
zo 
Ww 
Vv 
ca 
Ww 
a 


0 5, «.« = © 
INCOMING PER CENT OEFECTIVE 


Fic. 5. Characteristics of three plans having the same AOQL of one per cent 


more important influence on the choice of the most advantageous combination 
of f and 7 values for a given set of circumstances. For example, when the 
inspector is located at the end of the production line, it may be desirable to use 
a value of ¢ not greater than ‘some small multiple of the number of product units 
on the line at any one time. . Or again, the value of f is often influenced by the 
normal work loads of the inspector and the operators on the line. Protection 
against “spotty” quality, such as may arise from temporary irregularities in 
workmanship or materials, should receive special consideration in connection 
with the choice of f. 





274 H. F. DODGE 


11. Protection against spotty quality. The p, scale at the right of Fig. 3 pro. 
vides a guide concerning the protection afforded against spotty quality in 
continuous run of product. The value of p, is the per cent defective in a run of 
1000 consecutive product units, for which the probability of acceptance by sam- 
ple is 0.10 for a percentage sample equal to the corresponding f value shown on 
the chart. 

This scale indicates that the protection against spotty quality falls off very 
rapidly with f and that the protection, considering runs of product of 100 
consecutive units each, becomes quite poor if f is less than 2%. 


12. Effect of selecting group samples rather than one unit at a time. The 
above development assumes selection of individual sample units one at a time 
from the flow of product and immediate examination of a unit to determine 
whether or not it is defective. Deviations from this procedure will in general 
result in giving values of AOQL higher than those shown in Fig. 3. 

For example, the actual AOQL may be higher than the theoretical value (a) 
if the inspector delays looking at the individual units immediately when they 
are withdrawn from the line, or (b) if he selects a group of units at one time 
from the production line. The effect of either of these two deviations, both 
constituting a delay, may be quite large if z is small, or if large group samples 
are taken. 

Although the modification of the theoretical AOQL value resulting from the 
selection of group samples has not been thoroughly explored, this should not be 
excessive, 

(a) if group samples of n = 10 or less are drawn from the line, and 
(b) if 7 = 50 or more, 
provided there is no delay in examining the group samples drawn from the line. 

It should be noted however, that the effect of these delay factors on the AOQL 
may be compensated for in part if, when a defect is found, the 100% inspection 
includes some of the units that have already passed the inspection point. 

Where appreciable delays are unavoidable, an alternative is to withhold from 
acceptance a stipulated number of units pending the examination of the sample 
units that have been selected to represent this quantity of product. Such a 
procedure provides in effect a lot acceptance plan, the treatment for which is 
covered in Part III. 


13. Administration of inspection operations. The inspection plan is most 
effective in practice if it is administered in such a way as to provide an incentive 
to clear up causes of trouble promptly. Such an incentive may be had by im- 
posing a penalty on the operating or manufacturing department when defects 
are encountered. Normally, no such penalty is imposed if both the sampling 
inspection and the 100% inspection are performed by the same person or group 
of persons and the two costs merged; the inspector then merely serves as al 















































SAMPLING INSPECTION PLAN 275 





agency for screening defects when quality goes bad. It is accordingly recom- 
mended that the sampling inspection and the 100% inspection operations be 
treated a8 two separate functions. 

With this in mind, the inspection work can be performed by two different 
inspectors, designated inspector C and inspector M. Inspector C may be 
considered as the consumer’s representative in that his work is performed as a 
function independent of the manufacturing group. The term “consumer” 
js used in the general sense of the recipient of the product after the inspection 
has been completed. Inspector M is responsible to the Manufacturing Depart- 
ment and the cost of his work is borne by that Department. His work must 
however be subject to the surveillance and approval of inspector C. 

The following method of administering the inspection plan can then be used: 

(a) Inspector C inspects the required fraction f. So long as no defects are 
found, product is considered acceptable and is passed. 

(b) When inspector C finds a defect, he 
1. continues inspecting the fraction f, 

2. places some identification on the succeeding flow of product to indicate 
nonacceptance (or diverts it from the regular production line if the 
design of the line permits), such designation to apply until clearance 
is obtained in accordance with paragraph (c), and 

3. calls inspector M to inspect the succeeding flow of product in accord- 
ance with paragraph (c). 

(c) Inspector M (one or more inspectors as needed) inspects all succeeding 
units, except those inspected by inspector C in the fraction f, until the 
required number of units, 7, are found clear of defects. Inspector M 
reports immediately to Inspector C all defects found in the course of his 
100% inspection and notifies him when a run of 7 units has been found 
clear of defects. 

(d) When notified that a run of 7 units has been found clear of defects, in- 
spector C, if satisfied with the work of inspector M, releases inspector M. 

(e) To facilitate speedy correction of causes of trouble, inspector C, on finding 
a defect, should promptly notify the production foreman or other desig- 
nated authority and furnish the latter with detailed information regarding 
the character of the defect found. 

It will be noted that the above procedure requires calling inspector M whenever 
inspector C finds a defect. To avoid taking such action on the occurrence of 
a single defect, the procedure can be modified so that inspector M is called into 
the picture only when two defects in succession are observed by inspector C. 
Where this feature is desired, paragraph (b) above may be modified to read 
as follows: 

(b) When inspector C finds a defect, he 
1. proceeds immediately to inspect all succeeding unite v up to a total of 

i units, and if no defects are found therein, he again limits his inspection 


H. F. DODGE 


to the fraction f. If, on the other hand, during the course of inspecting 
the next 7 units, inspector C finds a second defect, he immediately 
discontinues his 100% inspection, 

2. places some identification on the succeeding flow of product ... ete. 
While this procedure carries the disadvantage of placing a varying work load 
on inspector C, it is often preferred since a single defect tends to be regarded as 
an isolated occurrence whereas two defects in quick succession, (like a first and 
second offense) are normally accepted as sufficient evidence to justify special 
action. 


14. Inspection for several kinds of defects simultaneously. The procedure 
given above may be applied directly to an inspection covering two or more kinds 
of defects, provided that the chosen AOQL value applies to all defects collectively 
and each unit inspected is always inspected for all of the defects under considera- 
tion. 

It is sometimes desired, however, when a defect of one kind is observed, to 
confine the 100% inspection to this one kind of defect alone. This requires a 
modification of the general procedure and the establishment of a separate AOQL 
for each kind of defect. A similar modification is required for example where 
the inspection is to cover several kinds of defects, but where the defects are 
grouped into two or more classes, according to their seriousness, and the defects 
in each class treated collectively. 

The following paragraphs outline for illustrative purposes a procedure for 
use where the defects under consideration are to be classified into two groups, 
Major and Minor, and where all Major defects are to be treated collectively 
and all Minor defects likewise. By analogy, the procedure to be followed when 
each kind of defect is to be treated separately will be obvious. In any event, 
the fraction f is made the same for all classes or all kinds of defects. 

Procedure 
Several kinds of defects are grouped into two classes with respect to serious- 
ness; designated Major and Minor. 
All defects of the same class (Major or Minor) are treated collectively. 
Preliminary 
(1) Establish an overall AOQL value for Major defects and an overall AOQL 
value for Minor defects. Select a suitable value for f, applicable to both 
Major and Minor defects. From Fig. 3 determine a value of 7 for Major 
defects, designated 7, , and a value of 7 for Minor defects, designated i, . 

(2) At the outset, inspect 100% of the units consecutively for both Major 
and Minor defects until 74,, units in succession are found clear of defects 
(imax = ta OF tg, Whichever is the larger). 

Routine 

(3) When iy,, units in succession are found clear of defects, discontinue 100% 
inspection and inspect only a fraction f of the units for both Major and 
Minor defects, selecting individual sample units one at a time from the 
flow of product. 





SAMPLING INSPECTION PLAN 277 


(4) If a Major (or Minor) defect is observed during sampling inspection, 
inspect 100% of the succeeding units only for defects of the class in question 
until 7, (or 7g) units in succession are found clear of defects of this class. 
(4.1) During the 100% inspection referred to in (4) inspect a portion f 

for both Major and Minor defects. 

(4.2) If during the 100% inspection for a particular class of defect (Major 
or Minor), a defect of the other class is observed on an individual 
unit of product, start 100% inspection for defects of the new class 
only if the new defect is observed on one of the f units that has been 
inspected for both Major and Minor defects, and continue such 
100% inspection for defects of the new class until 7 (as determined 
in (1) for the new class) units in succession are found clear of defects 
of the new class. Do not take such action, however, if the new 
defect happens to be observed on one of the non-f units. 

(5) When the proper number of successive units are found clear of defects 
as in paragraph (4) or (4.2), reinstate sampling inspection as in para- 
graph (3). 

From the above it may be appreciated that difficulties of administration are 
introduced in treating a large number of classes of defects or a large number of 
individual defects separately. How best to group defects together for collective 
treatment can generally be determined from the nature of the inspection opera- 
tions, whether visual or gauging, and the expectancy of defects as determined 
from the quality history. Items involving visual inspection, can often be treated 
collectively to advantage. 


As is generally true, the layout of an inspection plan depends to a considerable 
extent on the nature of inspection operations to be performed. Simplicity of 
administration is always to be desired. From the standpoint of minimizing 
overall inspection costs, it is often preferable, where several quality character- 
istics are to be inspected, to break down the inspection work into two or more 
separate inspection steps, each covering a relatively small number of char- 
acteristics. 


III. INspEcTION OF A FLOW oF INDIVIDUAL LoTs OR SUB-LOTS 


15. Purposes of Inspection. A manufacturer’s inspection of his own product 
serves two purposes’: 

(a) Process Control—To provide a basis for action with regard to the pro- 

duction process with a view to better future product. 

(b) Product Acceptance—To provide a basis for action with regard to the 

product already at hand. 

The plan outlined in Part II has both of these purposes in mind, but the provi- 
sion for selecting sample units continuously from the production line places 
special emphasis on control. It aids, for example, in the prompt detection of 
defects and location of causes of trouble in the manufacturing process. 


*See A. S. A. War Standard, Z1. 3, Control Chart Method of Controlling Quality During 
Production, pp. 5-6, 1942, American Standards Association, New York. 





278 H. F. DODGE 


The problem of acceptance of preduct is often eased, though at some sacrifice 
to the control aspects of the inspection work, if product is submitted to the 
inspector in lots or sub-lots and a sample taken from each. 


16. Inspection procedure for sub-lots. With minor modifications, the plan 
and procedure of Part II can be extended to the case where material is offered 
as a flow of consecutive sub-lots of articles. In the inspection of parts, for example, 
the material may be offered in pan-loads or trays, each containing a collection 
of parts produced under essentially the same conditions. Or again, the product 
from a common source for a given short period of time, such as a half-hour, 
one hour, etc., may often be treated as a sub-lot and offered to the inspector as 
such for his acceptance. In what follows, however, it is essential that such 
sub-lots be kept in the ordér of their production. 

The theoretical development given in Part II makes use of random-order 
spacing of defects in a statistically controlled product, with the specific provision 
that the units inspected be selected in the order of their production. In applying 
the general plan to the inspection of a flow of consecutive sub-lots, we no longer 
have individual units available in the order of their production. But we can 
use the same theoretical framework if we consider the random spacing of defects 
as their spacing in the chain of inspected units arranged in the order of their 
inspection. The probability distribution of the spacing of defects in inspected 
units will be the same regardless of the manner of selecting the units to be 
inspected, so long as we hold to the concept of statistical control in our solution. 

The ‘7 units in succession to be found clear of defects,’’ discussed in Part II 
will now be defined as 7 consecutively inspected units. During sampling inspec- 
tion, a group sample of units will be selected from each sub-lot, and the fraction 
f will relate to the ratio of the number of units in the sample to the total number 
of units in the sub-lot. The fraction f will be held constant for all sub-lots. 
Furthermore, when it is required under the general plan to find 7 inspected units 
in succession clear of defects, the 100% inspection must be allowed to extend 
into immediately succeeding sub-lots if 7 units in succession are not found clear 
in the current sub-lot. 


17. Procedure B. The procedure is as follows: 

(a) At the outset, start inspecting 100% of the units in a sub-lot and continue 
such inspection until 7 inspected units in succession are found clear of 
defects. Extend the 100% inspection, if necessary, into one or more 
succeeding sub-lots in the order of their production. 

(b) When 7 inspected units in succession are found clear of defects, discontinue 
100% inspection and inspect only a fraction f of the units from each of 
the sub-lots, selecting the sample units in such a way as to fairly represent 
the sub-lot. 

(c) If a sample unit is found defective, start a 100% inspection of the re- 
mainder of the sub-lot, and continue the 100% inspection until again 1 





SAMPLING INSPECTION PLAN 279 


inspected units in succession are found clear of defects, as in paragraph 
(a), extending such inspection into succeeding sub-lots, if necessary. 

In the event the 100% inspection extends into one or more succeeding 
sub-lots, if the number of units inspected in the last of such succeeding 
sub-lots exceeds a fraction f of the number of units in the sub-lot, accept 
this last sub-lot without further inspection. If on the other hand, the 
number of units inspected in this last sub-lot is less than the fraction f, 
inspect additional units from this same sub-lot to make up a sample equal 
to a fraction f of the number of units in the sub-lot. 

(e) Correct or replace with good units all defective units found. 

As was the case in Part II, the inspection plan is defined by two constants, 
f and 7, and the protection offered is expressed in terms of AOQL. This sub-lot 
inspection plan differs from those already published in that the screening action 
is not confined to a single sub-lot but may extend over a succession of sub-lots, 
the entire production being regarded as a train of sub-lots that are linked together 
for purposes of inspection in the order of their production. 


IV REMARKS 


It will have been noted that the plan here outlined should be regarded as a 
“special purpose” plan applicable under the conditions which have been enu- 
merated—where production is practically continuous, where inspection is to be 
made during production or immediately thereafter and is to serve not only as 
a screening acceptance agency if necessary, but as an aid to process control by 


disclosing promptly any sub-standard quality conditions in the product. It 
is believed that the general plan provides a structure, which with possible var- 
iations in procedure to serve particular circumstances, may be found useful in 
designing additional sampling inspection techniques. 





ON THE THEORY OF RUNS WITH SOME APPLICATIONS TO 
QUALITY CONTROL’ 


By J. WoLrow1tTz 


Columbia Universitu 


1. Recent developn:ents in the theory of runs. The increasing number and 
importance of recent advances in the theory and statistical applications of runs 
may make a brief paper on the subject of some interest. The large volume of 
material and its wide dispersal, together with the limitations of space, will of 
necessity make these remarks far from exhaustive and complete. 

I shall not define a run because new advances and applications of new criteria 
to new problems would probably soon render most definitions obsolete. Runs 
as used in statistics are best characterized by a philosophy and a technique rather 
than by the employment of any one specific device. What is always involved is 
the ordering of observations according to some characteristic and the resultant 
effect of this ordering on the ordering according to some other characteristic. 
For example, if the seats at a meeting of statisticians and engineers are numbered 
and occupied by m engineers and n statisticians, then if we list the numbers of the 
occupied seats in ascending order and replace each number by E or S according 
as the seat is occupied by an engineer or statistician, we shall have a sequence of 
m + nelements, m E’s and n S’s. Thus, if m = 7 and n = 6, such a sequence 


might be 
EEESEESSSESSE. 


If we were interested in knowing how well engineers and statisticians are ac- 
quainted with one another, we should find it of interest to study the runs of E’s 
and S’s in this sequence. Any subsequence of consecutive E’s or S’s which can- 
not be enlarged is called arun. Thus in the example above there is a run of E’s 
of length 3, followed in order by a run of S’s of length 1, a run of E’s of length 2, a 
run of S’s of length 3, a run of £’s of length 1, a run of S’s of length 2, anda 
run of E’s of length 1. Runs of this kind are usually called runs of two kinds of 
elements. Naturally the characteristic according to which we order (in the 
example above, seat number) and the characteristic whose runs are observed 
(E or S) may be various. They ought in general to have a meaningful connec- 
tion. 

The order of observations has no value if it is known that the observations are 
independent and random from the same universe and one seeks to estimate a 
parameter of the universe. Many of the statistical problems treated in the 
literature are of this character. In quality control of manufactured articles one 


1 Revised from an expository address delivered at a joint meeting of the Institute of 
Mathematical Statistics and the American Society of Mechanical Engineers at New York, 
May 29, 1943, at the invitation of the program committee. 


280 





THEORY OF RUNS 281 


of the fundamental problems is to decide whether the observations are “‘random,”’ 
or in the language employed in this field, whether statistical control exists. For 
this purpose indiscriminate pooling of data which suppresses the order charac- 
teristics of the observations represents a loss of valuable information: 

The algebra of runs of two kinds of elements is fairly elementary and most of 
the distribution problems involved have been solved. Suppose an urn contains 
m white balls and n black balls, thoroughly mixed, and m + n drawings are 
made without replacen:ent. There are = + = different sequences of W’s 
and B’s possible, and each sequence has the same probability. Let us find in 
how many ways the m elements W can be arranged to give k runs. By a trick 
due to Euler, this is the coefficient of z” in the purely formal expansion of 


(eta +--+ +2") 


which is the same as the coefficient of x” in the formal expansion of 


@tetet y= (72 ) 


l-—z 
and is therefore * 7 : ) (which is, of course, the combinatorial symbol for 
(m — 1)! ) 
(m— k)!(k — 1)!7° 


It is easy to see that the number of sequences of W’s and B’s which have 2k 
runs of both kinds is 


5 


and hence that the probability that U, the number of runs of both kinds, be 


2k is 
9 (;" — 7 (; ~ )(” + ' 
k-—-1/\k-1 m 

The details of this and other relevant derivations can be found in Wilks [1], 
Mood [2], Wald and Wolfowitz [3], and Stevens [12]. The formulae given there 
are of the type given above; e.g., for the probability that U = c. Application 
to tests of significance usually requires formulae of the type which give the proba- 
bility that U < c. This causes some difficulty in application and raises a need 
for suitable tables. Useful tables have been given by Swed and Eisenhart [4] 
and by P. S. Olmstead in an article by Mosteller [5]. The latter table really 
deals with a special case of runs of two kinds of elements. 

The devices described above were systematically utilized by Mood [2] to give 
a valuable collection of formulae. A representative result is that the joint dis- 
tribution of the numbers of runs of length 1, 2, --- , p and all those of length 


greater than p is asymptotically normal, with means and covariance matrix 
given. 





282 J. WOLFOWITZ 


The results given by Mood are limited to a classification of runs into a finite 
number of classes. The author [6] has given a general result which permits 
weighting runs of all lengths. 

Closely allied to runs of two or more kinds of elements are runs from a bino- 
mial or multinomial population. If the observations are classified into k classes, 
designated by 1, 2, --- , k say, and each observation has a constant probability 
p; of falling into the 7th class (¢-= 1, 2, --- , ) then a sequence of | observations 
all of which belong to the same class and which is preceded and followed by ob- 
servati8ns which belong to another class (except, of course, when the sequence 
is at the beginning or at the end of the series) is called a run of length 1. Ifa 
coin, whether unbiassed or not, is tossed repeatedly, the runs of heads and tails 
are runs from a binomial population (i.e., k = 2) and if the coin is unbiassed, 
Pr = Pe = }. 

The algebra of these runs has been studied mainly by von Bortkiewicz [7], 
von Mises [8], Wishart and Hirshfeld [9], Cochran [10], and Mood [2]. Runs 
from a binomial population (say) differ from runs of two kinds of elements in 
that m and n (defined above) are chance variables. If therefore, in general, a 
distribution formula valid for a fixed m and n be multiplied by the probability 


of this particular set of m and n ((” . ") Pi p?) and summed over m and n, 


the result will be the corresponding distribution formula for runs from a binomial 
population. Von Bortkiewicz [7], Cochran [10] and Mood [2] derived the essen- 
tial parameters involved. Wishart and Hirshfeld [9] proved the asymptotic 
normality of the total number of runs from a binomial population, and these 
results were generalized by Mood [2]. 

Von Mises [8] proved that if N be the number of observations from a binomial 
population, the distribution of the number of runs of a length which is of the 
order of log N approaches the Poisson distribution with increasing N. 

Cochran [10], extending the work of Gold [11], made use of runs of this kind in 
order to study what they called “the persistence of weather’’, i.e., whether dry 
months tend to follow dry months and wet months to follow wet months. Ina 
long series of weather observations the months were classified as wet or dry and 
a four-fold table constructed of the number of months falling into each of the 
following categories: 

(a) wet month following a wet month 

(b) wet month following a dry month 

(c) dry month following a wet month 

(d) dry month following a dry month. 

The chi-square test was applied to the four-fold table to test the null hypothesis 
that the probability of whether a month was wet or dry was independent of what 
its predecessor had been. 

Olmstead [13] has made use of a run which is very similar to that of a run from 
a binomial population, except that the sequence terminates whenever an obser- 
vation on a specified one of the two classes (a ‘‘failure’’) is recorded. The author 





THEORY OF RUNS 283 


[6] has used a run defined as a sequence of consecutive integers in a permutation 
of the first n integers to test whether two variates are independently distributed 
when nothing is known about their distribution functions except that they are 
continuous. The rank correlation coefficient is usually employed for this purpose. 
Of great importance in quality control of manufactured output are runs up and 
down. If, in any of the n! equally likely (by hypothesis) permutations of the 
first n integers, we subtract each element from its successor and replace the result 
by + or — according as the difference is positive or negative, we get runs of + 
signs and — signs, called respectively runs up and down. The usage of the term 
length varies; in this paper we shall say that the length of a run is the number of 
+or — signsinit. This has the advantage that then the sum of the lengths of 
all the runs ism — 1. (Most quality control literature, which follows Shewhart 
[14] and Kermack and McKendrick [15], defines the length of a run as one more 
than the number of + or — signs in it.) Thus, for example, the sequence 


3476512 
will appear as 
++---¢+ 


after the + and — signs have been inserted, and has an ascending run of length 
2, followed by a descending run of length 3, followed by an ascending run of 
length 1. 

The distributions associated with runs up and down in general present mathe- 
matical difficulties greater than those associated with distributions of runs of two 
kinds of elements and the results are far from complete. The asymptotic 
expectation of r, , the number of runs of length p, was given with great brevity 
by Fisher [16] and in detail by Kermack and McKendrick [15], and the exact 
result was supplied by Wallis and Moore [17]. The matrix of covariances among 
the runs of various lengths is being computed, and, it is hoped, will be available 
for publication shortly. As far as the author is aware, no explicit formulae 
giving the probability that r, = k or that r, < k are known. Some recursion 
formulae of limited usefulness are available. 

The author has recently obtained the asymptotic distributions of r,, of 
Tr.) Tppy °°" Tp, jointly, and of related statistics. These are jointly normal. 
Hence certain quadratic forms in these variables have approximately the chi- 
square distribution. 

Anticipating somewhat the discussion to be given below, it may be mentioned 
here that the quadratic forms in certain of the r, which Kermack and McKend- 
rick [15] use to test for randomness, do not have the chi-square distribution which 
Kermack and McKendrick imply to them. Wallis and Moore [17] first pointed 
out that these quadratic forms were not the proper chi-square statistics for good- 
ness of fit because of correlation among the r,. The author’s recent results 
show that these forms do not have the chi-square distribution. 





284 J. WOLFOWITZ 


2. Remarks on applications of runs. Let us now turn to statistical applica. 
tions of some of the runs described above. Suppose we have a sample of m 
random independent observations on one variate and a similar sample of n 
observations on another variate. Suppose further that nothing is known a 
priori about the distribution of each except that both are continuous, and it js 
desired to test whether the two distributions are identical. This problem is of 
great practical importance and occurs frequently. In quality control of manv- 
factured output it may occur, for example, if we wish to test whether the output 
of two machines, two workers, two different processes, or that from raw materia] 
obtained from two different sources, is the same. Naturally the problem not 
only of two, but in general, of a larger number of samples may arise. 

The solution proposed in [3] is as follows: Let the m + n observations be 
arranged in order of, say, ascending size, and let each observation be replaced by 
F of S according as it comes from the first or second sample. The total number 
U of runs in both F and S is the statistic to be used. Small values of U are the 
critical values for rejecting the hypothesis of identity of distributions. Thus in 
the example above of the seating of statisticians and engineers in the auditorium, 
a small value of U, which implies that the S (statisticians) and the E (engineers) 
each tend to bunch together, would be regarded as evidence that the statisticians 
and engineers present are not well acquainted with one another. 

The statistic U seems a not unreasonable one for the purpose. A discrepancy 
between the two distribution functions will make alternation of values of the two 
variates less frequent. This idea was proved for large n in [3], where a gener- 
alized concept of statistical consistency is given. 

On the other hand, the choice of U as a statistic is arbitrary; other reasonable 
criteria can certainly be given (see, for example, Dixon [19]. In [3] it is shown 
that a criterion which had previously been proposed was not acceptable because 
the statistic was not consistent, but nevertheless consistency is a property en- 
joyed by many statistics and constitutes only a partial check on the arbitrariness 
of choice. An “abnormally” long run in one or both variates which would be 
regarded by ‘‘common sense” as an indication that the hypothesis ought to be 
rejected, might be accompanied by a large number of runs of length one which 
might make the value of U not critically low. Some writers suggest that the 
presence of a long run of sufficient length be regarded as indicating rejection of 
the null hypothesis. In that case, if most of the runs were comparatively long, 
while none were critically long, the null hypothesis would not be rejected under 
this criterion, but the value of U would be small. A step has been made in the 
direction of setting-up a criterion for the choice of statistic ((6]) so as to remove 
this arbitrariness. This involves an extension of the likelihood ratio principle. 
It must be remembered, however, that almost any criterion will fail to reject 
some sequence which, it seems intuitively, ought to be rejected. All statistical 
inference involves risks of error; one object of the science of statistics is to mini- 
mize these risks. 

Another possible test for the problem of two samples is to compare the num- 





plans 


ancy 
» two 
ener- 


rable 
10Wn 
pause 
y en- 
‘iness 
Id be 
to be 
vhich 
t the 
on of 
long, 
ander 
n the 
move 
ciple. 
reject 
istical 


THEORY OF RUNS 285 


bers of runs of various lengths with their expected numbers by the proper chi- 
square (Caution: the correlation among the variates must be taken into account). 
The author [6] has developed another test from an extension of the likelihood 
ratio. 

Whenever a uniformly most powerful test does not exist, and this is the case 
in most non-parametric problems, it is not usually possible to say that one test 
is more powerful than another, unless the set of alternatives is sufficiently de- 
limited. The power function is then the ultimate criterion for the choice of 
statistic. 

If a sequence of n unequal numbers be given, a very important question is to 
decide whether the sequence is a ‘‘random”’ one; if it is and the sequence repre- 
sents measurements on a characteristic of successive products of some manufac- 
turing process, the latter is said to be in statistical control. A precise mathe- 
matical formulation can be given to this statement about randomness. Let 
X,, X2, -:* , X, be chance variables, and let 2; , 72, --- , 7, be a set of random 
observations on the corresponding variables. To test whether 2, 22, --+-, Xn 
is a “random” sequence means to test the hypothesis that X,, X2,---, Xn 
are independently distributed and have identical distribution functions. This is 
in general a difficult problem, chiefly because of the large class of alternatives to 
the null hypothesis. 

Since the null hypothesis does not specify the distribution functions but only 
asserts their identity, the tests most generally sought have been such that their 
size is independent of the unknown (but identical for all the chance variables) 
distribution function. Certain reasonable procedures have been based on the 
numbers and lengths of runs up and down in the sequence. 

R. A. Fisher [16] suggested doing this, but gave no indication as to what 
statistic was to be used. Kermack and McKendrick [15] and Wallis and Moore 
[17] propose the following procedure, the former writers implicitly and the latter 
explicitly: Let 


n—l 


rp = > 


t=—p 


and denote by < the expectation of the general chance variable z. The proposed 
statistic is 


p—1 (r; Bes 7) (r’ ve 7)" 
> % ~ s + Pp 5 Pp 
fi fy 


t=1 


with the critical region the upper tail. Wallis and Moore recommend p = 3 and 
approximate the distribution by empirical methods. As we have seen above, 
Kermack and McKendrick err in ascribing to the statistic the chi-square distri- 
bution. 

The criticism has been made by Olmstead [19] that this statistic is insensitive 
to pronounced trends in the data. This is correct, and had been pointed out 
earlier in [17], where the prior removal of a trend is recommended. Since one of 





286 J. WOLFOWITZ 


the important problems of quality control is detection of a trend, this would limit 
the usefulness of the statistic for quality control purposes. 

It happens frequently when a new rank statistic has been proposed for testing 
a non-parametric hypothesis such as that of “randomness” above, that critics of 
the proposed criterion construct sequences which, they say, appealing to “‘ordi- 
nary common sense”’, any reasonable statistic ought to place in the region of 
rejection for almost any size of test. They then cheerfully point to the fact that 
the proposed statistic does not act in this reasonable fashion. A few remarks 
about this may not be amiss. 

A test for, say, “randomness”, which is to be made on the sequence of ranks, is 
really a numbering of the n! permutations of, say, the first n integers, according 
to the order in which they ought to be taken into the critical region in order to 
make the latter of any prescribed size. This numbering could even bedone by 
tabulating, for different n, the various sequences in their proper order. Aside 
from the obvious practical obstacles to such a tabulation, there would soon arise 
the difficulty that, after the ‘“‘obvious” sequences are assigned their places the 
investigator would have difficulty in assigning to most of the remainder an order- 
ing according to the degree in which they may be held to “contradict” the null 
hypothesis. Resort is therefore made to a statistic which can be given as an 
analytic expression in the ranks. Because of the inadequacies of the theory the 
formula is often chosen by analogy with a similar formula in classical statistics. 
Difficulties may arise because of this. 

Let us examine for a moment this intuitive notion of reasonableness. Most 
people, and even most statisticians, would agree that the sequence of the first n 
integers in ascendiag order is an indication of non-randomness. The basis for 
this notion is an intuitive conception of an alternative to the null hypothesis for 
which this sequence is very probable. The fact is, however, that if we admit all 
alternatives to the hypothesis of randomness, for any sequence of ranks whatever 
there exist infinitely many alternatives which assign to this sequence a probability 
of one. 

It seems to us that the difficulty can be met to a large extent by delimiting the 
class of distributions which constitute the alternatives to the null hypothesis, 
and by assigning to the admissible alternatives a weight function which measures 
the importance of the various alternatives (e.g., the financial loss caused by each). 
A profound treatment of this subject for the parametric case has been given by 
Wald [20]. This method has also the great merit that it removes the need for a 
choice of size of the region of rejection. 

In the control of the quality of mass production output one of the outstanding 
problems is to decide on the basis of a sequence of observations on the product 
whether the production process is in statistical control. Shewhart and his 
school of industrial statisticians base many of their tests on the sequence of 
ranks. On the basis of their experience they find that the causes which most 
often lead to a breakdown of statistical control are such as to cause shifts up and 
down in the level of the observations or trends in the observations. To detect 





THEORY OF RUNS 287 


the former they have devised the technique of runs above and below the médian 
and to detect the latter they use runs up and down. Runs above and below the 
median may be described briefly as follows: The 2m + 1 (odd number) of ob- 
servations furnish a sequence of rankings from 1 to 2m +1. The elements 1 to 
m are considered to be elements of one kind and the elements m + 2 to 2m + 1 
elements of another kind. We then have a special case of runs of two kinds of 
elements. Limitations of space prevent the presentation of more detail or a ° 
description of the ingenious scheme by which both kinds of runs are graphically 
exhibited. The reader is referred to [14], [5], and [21], among others. The tests 
used in the industrial applications are not always explicitly stated, nor do they 
always seem to be the same. The most common involve comparison of runs of 
various lengths with their expected number or else are based on the presence of 
abnormally long runs. 

A pretty application of tHe theory may be found in Campbell [21]. The cor- 
rosion of a copper plate was determined by a delicate mechanism which measured 
the electrical resistance in various places on the plate. The rectangular plate 
was divided by rows and columns into forty small rectangles in each of which a 
measurement was made. The readings were made in each column in successive 
order from one end to the other, and the columns were also measured in succes- 
sive order from one edge to the other. The observations, when examined for 
runs above and below the median and runs up and down, indicated something 
amiss (“absence of statistical control’’). Two causes were considered possible: 

(a) variations, over the plate, in the corrosion of the copper; 

(b) malfunctioning of the delicate measuring apparatus. 

The runs obtained by arranging the observations in successive order according 
to positions on the plate might be expected to be associated with (a), while the 
runs obtained by arranging the observations in temporal order might be expected 
to be associated with (b). The object was therefore to separate the two order- 
ings and this was done as follows: The rectangles were numbered 1 to 40 in the 
order in which the first observations had been made and a random permutation 
of this sequence was used to indicate the order in which the next set of observa- 
tions was to be made. The second set was then ordered in two different ways, 
first according to the temporal order of the observations, and second according 
to the original ordering by positions. The runs above and below the median and 
the runs up and down, in the first ordering of the second set of observations gave 
evidence of a lack of statistical control, while those in the second ordering of the 
same set did not. An investigation located the trouble in the measuring appa- 
ratus. : 


3. Conclusion. The manifold achievements of quality control as it is prac- 
ticed at present point to the desirability of still further development of theory 
and practice. We conclude this paper by suggesting a few directions in which 
the theory of runs could develop and be of greater assistance in quality control. 

(1) The kinds of runs and the statistics used for making decisions in a produc- 
tion process should be chosen on the basis of the kind of deviations from the 





288 J. WOLFOWITZ 


state of statistical control which the engineers consider most likely to occur. 
It is very likely that different production processes may require different sta- 
tistical procedures. 

(2) General distribution theorems should be developed, power functions ob- 
tained, and the correlations between different tests investigated. 

(3) The application of the weight function idea of minimizing financial losses 
should be considered. 

In these developments both engineers and mathematical statisticians would 
have important and complementary roles. The tempo of progress will depend 
in large part on the cooperation between them. 


REFERENCES 


{1] S. S. Witxs, Mathematical Statistics, Princeton, 1943. 
{2] A. M. Moon, Ann. Math. Stat., 11 (1940), p. 367. 
[3] A. Wap and J. WoLrow1tTz, Ann. Math. Stat., 11 (1940) p. 147. 
[4] Frrepa S. Swep and C. Ersennart, Ann. Math. Stat., 14 (1943) p. 66. 
[5] FrepericK MosTe.ier, Ann. Math. Stat., 12 (1941) p. 228. 
{6] J. WoLtFowi1tTz, Ann. Math. Stat., 13 (1942) p. 247° 
[7] L. von Bortx1ewicz, Die Iterationen, Berlin, 1917. 
[8] RicHarD von Misss, Zeitschrift fur angewandte Mathematik und Mechanik, 1 (1921), 
p. 298. 
[9] J. Wishart and H. O. HirsHFevp, Journal of the London Math. Soc., 11 (1936), p. 227. 
[10] W. G. Cocnran, Quarterly Journal of the Roy. Meteorological Society, 64 (1938) 
p. 631. 
[11] E. Goup, Quarterly Journal of the Roy. Meteorological Society, 55 (1929), p. 307. 
[12] W. L. Stevens, Annals of Eugenics, 9 (1939), p. 10. 
[13] P. S. Otmsteap, Ann. Math. Stat., 11 (1940), p. 363. 
[14] W. A. SHewuaart, “‘Contribution of statistics to the science of engineering’, Pro- 
ceedings of the Bicentennial Celebration of the Univ. of Penna., Philadelphia, 
1941. 
[15] W. O. Kermack and A. G. McKenprick, Proc. Roy. Soc., Edinburgh, 57 (1937), pp. 
228-240, 332-376. 
[16] R. A. FisHer, Quarterly Journal of the Roy. Meteorological Soc., 52 (1926), p. 250. 
[17] W. A. Wauuis anp G. H. Moore, A Significance Test for Time Series Analysis, New 
York, 1941. 
[18] P. S. Otmsteap, Journal of the American Stat. Assn., 1942, p. 152. 
[19] W. J. Drxon, Ann. Math. Stat., 11 (1940), p. 199. 
[20] A. Wap, Ann. Math. Stat., 10 (1939), p. 299. 
[21] W. E. CampsBE.L, ‘‘Use of statistical control in corrosion and contact resistance 
studies’, Bell Tel. System Tech. Publications , 1942. 





THE ACCURACY OF SAMPLING METHODS IN ECOLOGY 


By Pau G. Hoe. 
University of California at Los Angeles 


1. Introduction. For a number of years journals on ecology have contained 
articles on sampling techniques for estimating the distribution of common species 
of plants in various regions. Although much experimental work has been done 
on this problem and although the problem is essentially statistical in nature, no 
theoretical work of any consequence seems to have been attempted. This paper 
considers the question of the relative accuracy of common sampling methods 
from a theoretical point of view by means of geometrical probability and statisti- 
cal distribution theory. 

There are three common methods of sampling used by ecologists. They are 
designated by the names of coverage, abundance, and frequency. For each of 
these methods of sampling, there are two common choices of sampling unit, 
namely, the quadrat and the transect. By the coverage of a species in a region 
is meant the total area covered by the projection on the ground of the crowns of 
the plants of this species. By abundance is meant the total number of plants 
of this species in the region. By frequency is meant the number of sampling 
units in the region in which at least one plant of the species occurs. A quadrat 
is a sampling unit in the form of a square, usually several yards on a side. A 
transect is a sampling unit in the form cf a straight line, coverage in this case 
being the length of line covered by the projection of the crowns. 

In this paper it will be assumed that plants possess circular crowns. Further, 
it will be assumed that plant species distribute themselves at random in the 
region to be sampled. This is not necessarily the case, since there is often a 
tendency for plants of a given species to distribute themselves at random or 
otherwise in groups rather than as single plants. However, if sampling units 
are somewhat comparable in size, the relative accuracy of these methods of 
sampling based on a random distribution would be expected to hold fairly well 
for distributions somewhat removed from this ideal situation. Further, by the 
proper choice of sampling unit size, some non-random distributions behave very 
much as though they were random. 

The accuracy of a sampling method may be measured by the variance of the 
estimate of the quantity which is of interest. Here interest will be centered 
on the total coverage of a given species in the region being sampled. Thus, two 
sampling methods will be said to be equally accurate for coverage if they produce 
equal variances for the estimate of total coverage. 

The quadrat unit of sampling will be considered first for the three methods 
of sampling, after which the transect unit will be considered. 

289 





290 PAUL G. HOEL 


2. Quadrat coverage. Let the region to be sampled be a square B units op 
aside. Let there be n quadrats, each a square A units on a side, distributed 
at random in the region. Finally, let the total number of plants of the specie 
in question in the region be N, with the distribution of the radius of their crowns 
given by a frequency function f(r) whose explicit form will be specified later. 

First, consider a single plant of radius r and a single quadrat. The problem 
is to determine the variance of a, the area of that part of the plant lying in the 
quadrat. Now the probability that this plant will be found in any particular 
part of the region is obtained by treating the plant as a circle of radius r which 
is thrown at random in the region and then applying geometrical probability 
to the position of the center of the circle. Thus, considering only those situations 
when the center of the circle lies in the region, the probability that the circle 
will cover an area of at least a > 32r’ units of the quadrat is given by the ratio 
of the area of the subregion inside the quadrat whose boundary is the locus of 
centers of circles of radius r which have precisely a units of their area inside 
the quadrat, to the area of the region. Probabilities of this type may be treated 
as functions of a. The expressions below for such probabilities follow directly 
from Fig. 1, which displays one corner of the quadrat. 


P,fa < area < xr’] = 48,/B’, a > 4} 
P.[0 < area < a] = 4S./B’, a < 4rn 
Pla = xr'] = (A — 2r)’/B’, 

0] = Py. 


(1) 


Fia. 1 


Si = (A — r)(r — 2) - [ var, 





SAMPLING METHODS IN ECOLOGY 


where y is the ordinate of curve C,. Likewise 


—s¢-V 1322 


(3) S.=A(r+2)—244er° + I y’ dz’, 


where y’ is the ordinate of curve C, with respect to the primed axes and z is 
negative. Using the formula for the area of a segment of a circle, the equation 
of C; is easily found to be 


4) aVvro—2t+er sin“ +yVr-—y¥ytr sin’? =a xr+y>r 


ae 


wre—ox2r+er sin*“ +yVr—ytr sin! + 2Qry + 4x7” = 2a, 
(5) 


ot + y ze 
where the value of a is given in terms of z by 
2 


(6) ewro2+r sin* = + > = a. 


The equation of C2 is given by (5) with z negative. These equations do not 
permit the solution of y in terms of z; however, they can be thrown into the 
following parametric form with ¢ as parameter: 


: t 1 .fa/r—t 
x=rsin ‘3 + 3 cos (sh. 


(4’) 


r sin in | os | "= ‘ 
2 2 sin ¢ : 


2 
ii ‘3 4 1 cos je — r/2+ cost — ai 


2 1+ sint 


+dnds + 5 een 2a/r* — x/2 + cost — ‘| 
y 2 2 1 + sint s 


Since a may be treated as a parameter, equations (4) and (5), and hence (4’) 
and (5’), represent a system of curves C, and C,. Unfortunately, equations 
(4’) and (5’) are not convenient for integration purposes either, but they are 
convenient for numerical work. This system of curves can be approximated 
satisfactorily by means of simpler curves. One set of such approximating 
curves is the following system of circles: 


(x — 1) + (y — r) = (r -— 2)’, z>0 
(@— Vee + y — VP aa = (-2 + VFO, <0. 


Although inequalities may be obtained between the approximating and true 
curves, these are of little value for determining the accuracy of essential moments 





292 PAUL G. HOEL 


obtained by using these approximating curves; therefore the accuracy of theg 
approximating curves will be judged empirically by means of Fig. 2 in which 
the true curves are plotted by means of (4’) and (5’) for z = .6, .38, 0, —.3, —§ 
— .9, of r with solid lines and the approximating circles (7) and (8) with broken 
lines. Although the circles appear to fit poorly for relatively large positive 
values of z, this is not serious because these values occur increasingly less often 
than other values of z for a random circle and because the use of these circles 
is confined to the rate of change of area bounded by these curves and the lines 
x = randy = r. Since the true curves are approaching the circles with de. 
creasing positive z, their rate of change of area would not differ much from that 
for the circles even though the circles include larger areas for a given z. In the 
paragraph following (11), further evidence will be presented to show that for the 
computation of the first two moments of a, these curves give a good approxima- 
tion. ‘ 


Fig. 2 


For the purpose of obtaining the variance of a, consider the expected value of 
a‘. Since the variable a may be thought of as the sum of three variables which 
assume only the values 0, rr’, and 0 < a < xr’, from (1) it follows that 


a 2 wr? dar? 
E(a‘) = (xr’)* ae + I. a‘f,(a) da + I a‘fo(a) da, 


where fi(a) and f2(a) are the frequency functions for z > 0 and z < 0 respec 
tively. Now since 


wr2 
P,la < area < ar’] = | fi(a) da, 


and 


P{0 < area < a] = i fo(a) da, 





T these 
which 
; ~§ 
oroken 
OSitive 
3 Often 
Circles 
e lines 
ith de. 
m. that 
In the 
for the 
>xima- 


SAMPLING METHODS IN ECOLOGY 
it follows from (1), (2), and (3) that 


fila) da = —dP, = —4 


—2+V/ 53-32 
fio) da = aP, = 69S = 4 A-2%+5/ y de |ae. 


Using the approximating curves (7) and (8), these integrals become: 
/ ydx = r(r — 2) — T(r — 2)’, 


—st-V 7222 
[ y' dz’ = (1 - t\(-2 + Vr —2). 


fi(a) da = ml 4 - ar(1 — *) ~ sz |ae, 


fo(a) da = m4 — 22 — 2(1 - \(2ve=2 “ =) | 


Hence, 


2 r 
E(a") = (xr’)* oe 4+ zm , a 4 - ar(1 _ *) _ 5 | ae 


vi Lele aXe Nlove=e- oh) 


Substituting e¢ value of a from (6), standard integrals give the following 
values for 1 and k = 2: 


(9) E(a) = _ |(4) + a3], 
(10) E(a) = = (4) — 1.15 (4) + 46], 


where certain constants involving + have been evaluated to two decimals. 
If circles with centers outside the region but overlapping the region were 


also measured, then geometrical probability would give the following value 
for E(a): 


(11) E(a) = _ (4). 





294 PAUL G. HOEL 


Since in (1) only circles with centers inside the region are assumed measured, 
E(a) will be only very slightly iarger than this last value; consequently the 
approximate result in (9) is only slightly in error. For a quadrat ten yards on 
a side and plants averaging two yards in diameter, the error is in the neighbor- 
hood of one tenth of one percent; consequently formula (10) may be expected 
to be quite accurate as well. Another approximating system of curves lying 
largely on the opposite side of the true curves from the circles gave formula 
(10) with .46 replaced by .26, both of which have a negligible effect on E(q’) 
for ordinary applications. 

Formula (10) was derived on the assumption that the same circle was thrown 
repeatedly at random in the region. Consider now the situation when the 
circle varies in size according to the frequency function f(r). Treating a andr 
as two statistical variables, their joint frequency function may be expressed as: 


f(a, r) = f()f@|r), 


where f(a | r) is the frequency function of a when r has the fixed value r. Letting 
&(a*) represent the expected value of a* when r is permitted to vary according 


to f(r), 
&(a") = [ [ ev, r) da dr 
= [5@ [esaly da dr 


= | se) ar, 


where all integrals are taken over the regions for which a and r are defined. 
Consequently, from (10) and (11) 


2 
&(a’) = - Ee — 1.15Av5 + 6], 


and 
(12) 6a) = 54’, 
where the »’s represent moments of r. Hence the variance of a is given by: 
(13) = TIA’ — 1.15Ay5 + 465 — A‘e?/B"). 
Finally, let there be n quadrats, N circles whose radii vary according to f(r), 


and let the total area of quadrat covered by the N circles be denoted by s. 
Then 


(14) &(s) = nN&(a), 








the 
or- 


Ing 
ula 
a’) 


wn 
the 
d r 


ing 
ling 


f(r), 
y 8. 





SAMPLING METHODS IN ECOLOGY 


and 
(15) o, = nNos 
The purpose of measuring s is to use it to obtain an estimate of 7, the total area 
of the N circles. But 
(16) T = N&(xr’) = Nan. 
Substituting the value of v2 from (12) and using (14), 
T = B’&(s)/nA’. 
Hence an estimate of T' will be given by 
(17) T, = B’s/nA’. 
Using (15) and (13), the variance of this estimate will be given by 


2 «BN 


(18) or, = aD ~ 1.147 + 46 3 ~Z |. 


BR”? 

3. Quadrat abundance. In this method the sampler merely counts the num- 
ber of plants of the given species in each quadrat. Although this method was 
designed to estimate the total number of plants, it may be adapted to estimate 
total coverage as well. Since it is the practice to count a plant as lying in the 
quadrat only if its stem is in the quadrat, the probability that this event will occur 
is given by: 

(19) P, = A’/B’. 


Since there are n quadrats and N circles, the number of circles with centers 
lying in quadrats, which will be denoted by s, will follow the binomial distribu- 
tion; hence 


(20) &(s) = nNP,, 
and 
(21) o; = mNP,(1 — P,). 


From (16) and (20) it follows that 

T = rv,&(s)/nP,. 
Therefore an estimate of JT will be given by 
(22) T: = +B’ms/nA’, 


where m2 is a sample estimate of », obtained by measuring the diameters of k 
plants and calculating their mean area. Since m2 and s are independent, a 
standard formula for the variance of a product of two independent variables 
may be applied to give 


ot, = | =] teemijet + &e)e%) 





PAUL G. HOEL 


2 
&(m;) = “ ; + vi. 


Consequently, with the aid of (19), (20), and (21) 


2 42 _ 2 ° 
@) ob a TEA ee] + Ti at 


4. Quadrat frequency. In this method the sampler records the number of 
quadrats observed and the number of those quadrats which contained at least 
one plant of the given species. Given N plants, the probability p that at least 
one of them will be found in a given quadrat is given by 


p=1-—(1—-P,)", 


where P, is given in (19). For m quadrats the expected number of quadrats 
in which at least one plant will be found is therefore np. Letting w represent 
the number of such quadrats, 


&(w) = n[1 — (1 — P,)*). 


N = log E - 0) / tog — Pj. 


Consequently, from (16) an estimate of 7 will be given by 


T; = rm log| 1 — 2] / 08 1 — Pj. 


Neither the mean nor the variance of 7’; will exist because 7’; is a discrete variable 
which becomes infinite for w = n. Unless the density of the species is very low, 
values of w near n will occur quite often and hence cause 7’; to vary widely. 
Consequently the frequency method will be inferior to the abundance method 
except when the mean density is low, in which case the abundance method is 
practically as easy to apply. -Because the frequency method is obviously inferior 
to the abundance method, it will not be considered further here. 


Solving for N, 


5. Transect coverage. In this type of sampling a line is laid down and the 
length of line covered by a plant of the species in question is recorded. Let 
there be n such lines, each L units in length. 

If-a circle of radius r is thrown at random in the region, it will cross a line 
ouly if its center lies within the subregion, indicated in Fig. 3, composed of a 





SAMPLING METHODS IN ECOLOGY 297 
rectangle of width 2r and length L with semi-circular ends. From this figure it 
is clear that the probability of the circle intersecting some positive length less 
than z of the line is given by four times the shaded area s; , divided by the area 


of the region. From this same diagram the following equations of the indicated 
curves result: 


Applying geometrical probability, 
P, [0 < intercepted length < 2] = - [ fiz) de, 
where f(z) is the frequency function for z. But 
8 = Zr - VF — 2/4) + = e Vea + 
iti 
l y” 


Standard integrals give 


= lb - VP aAl + eve + Fin Zh, 


Consequently, 


fle) dz = dPy = = a pea 





298 PAUL G. HOEL 


From this relation the following moments are readily obtained: 
E(z) = x7°L/B’ 
E(z*) = [8Lr* — xr‘l/B’. 
For variable r these formulas become: 
&(z) = wv2 L/B’, 
&(2*) = [48Lv3 — xv) /B’, 
o; = [A8Lv3 — wy)/B? — 2 3 L?/B*. 
Let & denote total z for N circles and n quadrats, then 
(24) &(£) = nNawnl/B’, 
and 
a; = nN {[4ELr3 — xv4)/B’ _ ? > L’/B*}. 
From (16) and (24) 
T = B’&()/nL. 
Hence an estimate of 7' will be given by 
(25) T, = Be/nL, 
and its variance will be given by 


2 
(26) or, = vie [3°Lv3 — xv] — r ih. 


6. Transect abundance. Since the probability, P;, of a circle of radius r 
intersecting a line of length L is the area of the band with semi-circular ends 
indicated in Fig. 3, divided by the area of the region, 


P, = (2rL + x°)/B. 


Hence, letting s represent the total number of intersections, as in the case of 
quadrat abundance, 


E(s) = nNP., 
| E(s*) = nNP\(1 — P,) + n° N’Pi, 
(27) &(s) = nN[2Ly, + xv2]/B’, 
6(s) = il (iii, + onl + tell — En + ted + ol. 
For simplicity of formula if nN — 1 is replaced by nN, the variance of s becomes 
= e {B° [2Ly, + rv] 


(28) 
+ nN[4L? (ve — vi) + 4aL(v3 — viv2) + 2'(m — v3)]}. 













us T 
2nds 


e of 


omes 


v2)]}. 





SAMPLING METHODS IN ECOLOGY 


From (16) and (27) 





v2 B’ &(s) 


r _ n{2Ly, ae rVv2| . 


Hence an estimate of 7 will be given by 





xB’ s 
n|x + 2Lal’ 


5 = 





where a is an estimate of »;/v2. In order to obtain a satisfactory estimate of 
y,/v,, data for the distribution of diameters of common California shrubs were 
analyzed. It was found that Pearson’s type three curve gave an excellent fit. 
Since the moments of this type distribution are given by 


















(29) se . 0 (1 + jv’, 


where p is the mean and V is the coefficient of variation, o/p, then »/v2 = 1/p@, 
where 6 = 1 + V’, and the above estimate becomes 


2 
r= B8ly — 


n 


mp0 + 2L1+¢)’ 


where ¢ = 76[7 — p]/[xp@ + 2L] and where 1/7 is chosen as an estimate of 
1/p. Since 7 will be approximately normally distributed for samples consider- 
ably smaller than those usually taken to find 7, assume that it is normally distri- 
buted with mean zero and variance o'2°6°/k[xp0 + 2L). Since L is large relative 
to « and since k will usually exceed twenty-five, this variance is very small, 
and hence the probability of » exceeding one numerically is extremely small. 
Although the value g = — 1 is theoretically possible on the normality assumption, 
such a value would not permit the existence of either the mean or variance of 
1/[1 + ¢]. However, if ¢ is restricted to a range of, say, ten standard deviations 
about zero, then |¢| < 1 for ordinary conditions and the variance will exist. 
Further, because y assumes such small values, with this finite range the variance 
of 1/[1 + ¢] is the same as the variance of ¢ itself if higher powers in this variance 
are neglected. Since s and ¢ are independent, the same product formula that 
was used for quadrat abundance may be employed here, together with the 
various approximations indicated above, to yield 


(30) 






2 Nxpo \ ff B 
Cr, = (_*",) {3 (2Ly + V2) 


+ 4L?(ve — vi) + 4eL (v3 — vive) + a°(u%u — | 


[r+ 


(31) 


4° y’ 


| 4 4° vy’ 
k(2L + xp6)? 


FL apa 2m + eal 








300 PAUL G. HOEL . 


7. Comparison of methods. Formulas (18) and (23) may be compared for 
relative accuracy of these two quadrat methods of measuring coverage. For- 
mulas (26) and (31) may be compared for relative accuracy of these two transect 
methods of measuring coverage. Finally, formulas (18) and (26), and formulas 
(23) and (31), may be compared to determine what length transect will give the 
same accuracy as a quadrat of given size. All such comparisons will necessarily 
have to be done numerically by considering typical values for the parameters 
involved. The moments occurring in these formulas are expressible by means 
of (29) in terms of p and V if the form of f(r) is that assumed here. For the 
data analyzed to determine f(r) it was found that V was approximately 1/3, 
These numerical comparisons will not be made here. 

The question of which type of sampling method should be employed now 
becomes a question of balancing relative ease or cost of sampling against size 
samples needed to produce equivalent accuracy as determined by means of 
these formulas. If total frequency is desired rather than total coverage, these 
formulas may be altered to handle this situation as well. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of general interest 


Personal Items 


Dr. Charles C. Wagner has been named Assistant Dean of the School of Liberal 
Arts at the Pennsylvania State College. 

Miss Ruth E. Jolliffe has taken a position in the Graphic Analysis Department 
of Bell Aircraft Corporation. 

Mr. H. F. Hebley has been appointed Director of Research for the Pittsburgh 
Coal Company. 

Lt. F. W. Dresch, USNR, U.S. Naval Proving Ground, Dahlgren, Virginia, 
has been promoted to the rank of Lieutenant Commander. 

Mr. George F. Mayer is a Sergeant in the United States Army and is stationed 
at Fort Lewis, Washington. 

Captain A. C. Cohen, Jr. of Picatinny Arsenal has been promoted to the rank 
of Major. 


New Members 


The following persons have been elected to membership in the Institute: 

Bassford, Horace R. B.A. (Trinity Coll.) Actuary, Metropolitan Life Insurance Co., 1 
Madison Ave., New York, N. Y. 

Benson, Kathryn E. M.S. (Washington) Teaching Asst., Univ. of Calif., Berkeley, Calif. 

Blackadar, WalterL. B.A.(McMaster) Asso. Actuary, Equitable Life Assurance Society, 
393 Seventh Ave., New York, N. Y. 

Buros, Asso. Prof. Oscar K. M.A. (Columbia) Rutgers Univ. (on leave), Captain, Signal 
Corps, A. U.S. 301 S. Courthouse Rd., Arlington, Va. 

Clinedinst, William O. M.E. (Carnegie Inst. Tech.) Eng., National Tube Co., Frick 
Bldg., Pittsburgh, Pa. 

Curry, Prof. Haskell B. Ph.D. (Géttingen) Penna. State Coll., State College, Pa., 6708 
N. Sixth Si., Philadelphia, Pa. 

Dix, Margaret J. M.A. (Rice Institute) Sec., Univ. of Calif., Berkeley, Calif. 

Groth, Alton O. M.S. (Iowa) Asst. Actuary, Equitable Life Insurance Co. of Iowa, 
Des Moines, Iowa. 

Gurland, John M.A. (Toronto) Instr., Univ. of Toronto, Toronto, Canada. 97 Metcalfe 
St., Ottawa. 

Humm, Doncaster G. Ph.D. (Southern California) 1203 Commercial Exchange Bldg., 
416 W. Eighth St., Los Angeles, Calif. 

Jahn, Fred S. M.S. (Florida) General Manager, New Plastic Corp., 1017 N. Sycamore, 
Hollywood, Calif. 

Jeming, Joseph M.A. (Columbia) Captain, Army Air Corps. 3010 Valentine Ave., New 
York, N: Y. 

Kavanagh, Arthur J. Ph.B. (Yale) Physicist, Spencer Lens Co., Buffalo, N. Y. 19 Doat 
St. 

Kennedy, Evelyn M. M.A. (Cincinnati) Industrial Economist, War Production Board, 
Washington, D.C. 1452 Fairmont St., NW. 

Lehmann, Eric L. M.A. (California) Asso., Univ. of Calif., Berkeley, Calif. 2514 Pied- 
mont Ave. 


301 





302 WASHINGTON MEETING 


Lew, Edward A. M.A. (Columbia) Asst. Actuary, Metropolitan Life Insurance Co., 1 
Madison Ave., New York, N. Y. 

Murphy, Ray D. A.B. (Harvard) Vice Pres. and Actuary, Equitable Life Assurance 
Society, New York, N. Y. 28 Godfrey Rd., Upper Montclair, N. J. 

Myers, James E. A.B. (Michigan) Leader-Statistical Analysis Group, Naval Res. Lab,, 
Washington, D.C. 3014 Nichols Ave., SE. 
O’Connor, Harry W. M.B.A. (Harvard) Stat., Sperry Gyroscope Co. Inc., Brooklyn, 
N.Y. 387 Meodow Woods Rd., Great Neck. . 
Painter, Frank M., Jr. M.B.A. (Harvard) Statistics Supervisor, Sperry Gyroscope Co., 
Brooklyn, N. Y. 343 82nd St. 

Salkind, William M.B.A. (Chicago) Asso. Stat., U.S. Dept. of Agric., Washington, D.C. 
2149 K St., NW. 

Simon, Leon G. Pension Consultant. 225 W. 34 St., New York, N. Y. 

Stewart, Oscar F. Statistics Supervisor, Sperry Gyroscope Co., Brooklyn, N. Y. 

Tucker, Ledyard R. B.S. (Colorado) Res. Asso., Univ. of Chicago, Chicago, Ill. 5456 
Greenwood Ave. 

Ullman, Joseph L. B.A. (Buffalo) Teaching Fellow, Mass. Inst. of Tech., Cambridge, 
Mass. 397 Jefferson Ave., Buffalo, N. Y. 


REPORT ON THE WASHINGTON MEETING OF THE INSTITUTE 


The fifteenth meeting of the Institute of Mathematical Statistics was held at 
George Washington University, June 17-19, 1943. About 200 persons including 
the following sixty-one members of the Institute attended one or more of the 
three evening sessions: 


T. W. Anderson, Jorge Arias, R. O. Been, H. R. Bellinson, B. M. Bennett, Richard 
Berger, Joseph Berkson, Felix Bernstein, Archie Blake, Dorothy S. Brady, W. G. Cochran, 
J. B. Coleman, Gertrude Cox, J. H. Curtiss, G. B. Dantzig, Besse B. Day, Robert Dorfman, 
H. F. Dorn, W. F. Elkin, W. D. Evans, R. H. Fadner, L. R. Frankel, M. A. Girshick, Harry 
H. Goode, C. H. Graves, T. N. Greville, F. E. Grubbs, Louis Guttman, Morris H. Hansen, 
W. A. Hendricks, W. N. Hurwitz, Walter Jacobs, Rachel M. Jenss, A. J. King, G. B. King, 
Lila F. Knudsen, H. 8. Konijn, Solomon Kullback, H. G. Landau, J. E. Lieberman, W. G. 
Madow, Sophie Marcuse, J.,.W. Mauchly, A. M. Mood, Harold Nisselson, Monroe L. Norden, 
H. W. Norton, A. C. Rosander, David Rosenblatt, P. J. Rulon, Marion Sandomire, W. A. 
Shelton, Harry Shulman, J. H. Smith, G. W. Snedecor, F. F. Stephan, Alice Sternberg, 
Benjamin Tepping, J. W. Tukey, C. R. M. Tuttle, F. M. Weida. 


The following program, arranged by Dr. W. G. Madow, was held: 


THURSDAY, JUNE 17 AT 8:00 P.M. 
APPLICATIONS OF SAMPLING THEORY 


Chairman, Proressor Frank M. We pa, George Washington University 


1. Some Recent Developments in the Application of Sampling Theory in Agriculture 
Arnold J. King, Iowa State College and Department of Agriculture; Walter A. 
Hendricks, North Carolina State College and Department of Agriculture 

2. The Relative Efficiency of Block Samples in Housing Surveys 
Lester R. Frankel and William J. Cobb, Bureau of the Census 

3. The Optimum Size of Sampling Units 
Dorothy Cruden and Alice Sternberg, Bureau of the Census 








































PITTSBURGH CHAPTER 


7 FRIDAY, JUNE 18 AT 8:00 P.M. 
RECENT DEVELOPMENTS IN STATISTICAL THEORY 


— Chairman, PrRoressor GEeorGE W. SNEDECOR, Iowa State College 
ab., 1. On Some Recent Developments in Sampling Theory 
Morris H. Hansen, William N. Hurwitz, and William G. Madow, Bureau of the 
lyn, Census and Office of Price Administration 
9. On the Variance of Estimates Arising from Stratified Samples 
Co., Frederick F. Stephan, War Manpower Commission 
3. Statistical Techniques for the Comparison of Different Scales of Measurement 
». C, William G. Cochran, Iowa State College 
4. Adjustments for Differential Refusal Rates in Samples of Human Populations 
Jerome Cornfield, Bureau of Labor Statistics 
5. On the Verification of Weather Forecasts 
5456 Horace W. Norton, Weather Bureau 
idge, SATURDAY, JUNE 19 AT 8:00 P.M. 
SOME PROBLEMS IN STATISTICS 
Chairman, CoLoNnEL LEs.iE E. Simon, War Department 
E 1. The Application of Statistical Methods in Acceptance Inspection 
Harold Bellinson, War Department 
d at 2. The Distribution of the Radial Standard Deviation 
di Captain Frank E. Grubbs, War Department 
ms 3. Some Results in Tests of Randomness 
i the M. A. Girshick, Department of Agriculture 
4. Corrections for Groupings 
John H. Smith, Bureau of Labor Statistics i 
‘hard 5. On Group Blood Testing : 
hran, Robert E. Dorfman, Office of Price Administration 
man, 
larry EpwIn G. OLDs, 
nsen, Secretary 
King, 
N.G. 
rden 
V.A. REPORT ON THE FIRST MEETING OF THE PITTSBURGH 
berg, CHAPTER OF THE INSTITUTE 
. The first meeting of the Pittsburgh Chapter of the Institute of Mathematical 
Statistics was held at Carnegie Institute of Technology on Saturday, June 19, 
1943. Thirty-six persons attended the meeting, including the following ten 
members of the Institute: 
Shirley Bernstein, M. A. Brumbaugh, Karl Fetters, H. J. Hand, G. E. Niver, F. G. 
" Norris, E. G. Olds, R. F. Passano, E. M. Schrock, R. W. Shephard. 
er A. 


Morning and afternoon sessions were devoted to a round-table discussion of 
present industrial uses of statistical methods. Mr. R. F. Passano, Bethlehem 
Steel Co., led the discussion. Mr. F. G. Norris, Wheeling Steel Corp., acted as 
chairman of the sessions. 


304 FITTSBURGH CHAPTER 


The Pittsburgh Chapter was formed from the Society of Industrial Quality 
Statisticians, which has held meetings at Carnegie Institute of Technology since 
1941, with the object of providing a symposium for those interested in industria] ” 
applications. The Constitution of the Pittsburgh Chapter was ratified at the | 
meeting. The object of the Chapter is to foster the advancement of mathe- 7 
matical statistics and to promote its application to industrial problems. 

The following officers for the Chapter for 1943 were elected: 


President, F. G. Norris, Wheeling Steel Corp. 
Vice President, K. L. Ferrers, Carnegie Institute of Technology 
Sect.-Treas., H. J. HAND, National Tube Co. 
Sponsor, E. G. O_ps, Carnegie Institute of Technology 
Board Members, R. F. Passano, Bethlehem Steel Co. 
J. MANUELE, Westinghouse Electric & Mfg. Co. 


Howarp Hann, 
Secretary of the Pittsburgh Chapter 








